Loading…
H2OWorld has ended
Wednesday, November 11 • 3:00pm - 3:30pm
Benchmarking open source ML platforms

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Binary classification is one of the most widely used machine learning methods in business applications. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. There are countless off-the-shelf open source implementations for the previous algorithms (e.g. R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.), but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementations of the same algorithm in terms of scalability, speed, accuracy. In this talk we will see which open source tools work reasonably well on larger datasets commonly encountered in practice.

 

Speakers
avatar for Szilard  Pafka

Szilard Pafka

Chief Data Scientist, Epoch
Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to investigate the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit... Read More →


Wednesday November 11, 2015 3:00pm - 3:30pm PST
Ramanujan Stage