It is often required to manipulate the data as fast as possible, be it either column average calculation or simple join and filtering. Servers often do not have convenient tools, such as Python (numpy/pandas) of R, moreover, the data might not fit into memory. This talk shows how to make fast but inconvenient command line tools great again. This is based on work done in Yandex and allows to process Gbs of data with one-liner commands.
Kirill Pavlov Kirill Pavlov
Kirill graduated from Moscow Institute of Physics and Technology. He started his career as a software developer in advertisement department of Yandex, listed search engine company. After polishing engineering and data mining skills he moved to Hong Kong and led a development team in Multichannel, demand side advertisement platform. Then he joined Asia Miles of Cathay Pacific as Data Scientist and worked with Apache Spark and XGBoost. Now Kirill is working in Terminal 1, IT and analytics boutique recruitment firm, bringing data mining into a traditional industry. He is passionate about studying new things and open source contribution.