Cluster analysis is a machine learning technique designed to group similar objects or data points together. It has wide applications from customer segmentation to the development of recommendation engines. This talk will cover the basis of clustering and offer a practical guide on how to implement it.
There’s a saying that 80% of a data scientist’s time is spent on data preprocessing, and only 20% is on modeling and analysis. Therefore, this talk will also cover the practical considerations and data challenges a machine learning practitioner faces, using a case study developed based on my data science projects in healthcare.
The following will be covered in this talk:
- An introduction of what cluster analysis (and unsupervised machine learning) is, and some examples of its general applications.
- Practical considerations related to ML projects in general, including challenges associated with data cleaning and feature engineering.
- A ML case study in healthcare: to showcase problems that data scientists encounter, from getting the raw data to transforming it into the form ready for modeling.
Date and Time : October 29, 2022 / 14:00-14:30 ( UTC+8 ) Language : Cantonese Speaker : Mr. Warrington Hsu / ComboKid / Hong Kong
Speaker Introduction
Mr. Warrington Hsu
Warrington is a data scientist specializing in the application of machine learning and big data in healthcare. He has 10+ years of experience in machine learning, computational statistics and bioinformatics. Warrington is the co-founder of ComboKid, a startup that helps parents track and improve the developmental health of their children using machine learning. He has built machine learning applications and generated medical knowledge using data sources ranging from free-text clinical notes written by doctors, to the centralized electronic health record databases covering the Hong Kong population. Warrington’s research in health data science has been published in peer-reviewed academic journals.