This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, November 11 • 2:00pm - 2:30pm
Sparkling Water on the Spark Notebook: Interactive Genomes clustering - Xavier Tordoir

Sign up or log in to save this to your schedule and see who's attending!

It’s a matter of fact that H2O provides advanced Machine Learning capabilities scaling with large datasets. Also, interoperating between H2O and generic large scale data manipulation frameworks like Apache Spark is of utmost importance to help Data Scientists bring the most efficiency on the table, this is where Sparkling Water is shining. The last stone of the edifice is then to  work interactively on data from a single environment, allowing the data scientist to share his results and code. We present here the Spark Notebook working with Sparkling Water to bring the valuable H2O libraries to the Spark environment. We show a case of genomics data processing, leveraging Spark and its genomics library ADAM to efficiently access raw data with domain specific objects, data preparation is done with spark and deep learning from H2O is used to compute a model for population stratification within the set of genomes under investigation.

avatar for Xavier Tordoir

Xavier Tordoir

Founder, Data Fellas, Inc.
Xavier started his career as a researcher in Experimental Physics and also focused on data processing. Further down the road, he took part in projects in finance, genomics and software development for academic research. During that time, he worked on timeseries, on prediction of biological molecular structures and interactions, and applied Machine Learning methodologies. He developed solutions to manage and process data distributed across data... Read More →

Wednesday November 11, 2015 2:00pm - 2:30pm
Erdos Stage