Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, November 11 • 3:00pm - 3:30pm
Benchmarking open source ML platforms

Sign up or log in to save this to your schedule and see who's attending!

Binary classification is one of the most widely used machine learning methods in business applications. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. There are countless off-the-shelf open source implementations for the previous algorithms (e.g. R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.), but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementations of the same algorithm in terms of scalability, speed, accuracy. In this talk we will see which open source tools work reasonably well on larger datasets commonly encountered in practice.

 

Speakers
avatar for Szilard  Pafka

Szilard Pafka

Chief Data Scientist, Epoch
Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to investigate the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit card processing company doing everything data (ETL, analysis, modeling, visualization, machine learning etc). He is also the founder/organizer of several data... Read More →


Wednesday November 11, 2015 3:00pm - 3:30pm
Ramanujan Stage