Code: https://github.com/vionwinnie/pyspark-horse-race-predict
一般的機器學習課程只介紹怎樣用 JUPYTER NOTEBOOK 用 CSV FILE 去訓練 MACHINE LEARNING MODEL。但實際上在商業情況下,一>旦把模型 DEPLOY ,做 SCORING 的時候就要將 BIGDATA 既 DATAPIPELINE 連接到去 MODEL。我想介紹怎樣用 PYSPARK 建立 ETL PIPIELINE , 從 HIVE DATABASE 抓取數據及整理數據 再傳送到 tensorflow 模型去計算結果並儲存數據。 這是一個 ONLINE MACHINE LEARNING SYSTEM 和 BIG DATA 應用的簡介。(Spark, neo4j, Python, Tensorflow, ETL, end-to-end machine learning model deployment)
Language: Cantonese
Date & Time: 7 November 2020, Saturday 14:40 – 15:10.
Speaker: Winnie Yeung
Winnie is a data scientist at Visa in the Bay Area. Python user since 2016. Graduated from masters of Analytics at Georgia Institute of Technology. Currently busy building machine learning models to catch fraudsters.
GitHub: http://github.com/vionwinnie
LinkedIn: https://www.linkedin.com/in/winnieyeung/