Talk: Reproducible Data Analysis in Python

Speaker: Chris Choy

Language: Cantonese (with English Slides)

Intermediate
Photo of Chris Choy

About Speaker

Chris is currently a Senior Computational Scientist at ClusterTech. He has worked on tailored data mining solutions for clients in the banking and retail industry. Prior to joining ClusterTech in 2014, he earned his Dphil in Oxford with a research focus in high dimensional statistics.

About the Topic

Data analysis consists of multiple steps: data cleansing, modelling and reporting. Reproducible analysis consists of a set of codes that execute the data pipeline from raw data to reports. Reproducibility allows one to easily trace the details of your analysis, adapt to changes and provide updated analysis when new data arrives. This talk will discuss the tools for a simple reproducibility analysis: data manipulation in Pandas, templating with Jinja2, embedding plots from Bokeh and how to coordinate various scripts with a Makefile.

Tags: Data, Reproducible Research