Presentation: "Fast Analytics on Big Data"

Track: Leading & Bleeding Edge - Day 1 / Time: Monday 11:30 - 12:20 / Location: Auditorium, AROS

We have built an open-source platform for dealing with in-memory distributed data. We've used it to built state-of-the-art predictive modeling and analytics (e.g. GLM & Logistic Regression, GBM, Random Forest, Neural Nets, PCA to name a few) that's 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it's tooo slow on big data!). We can run R expressions on tera-scale datasets, or munge data from Scala & Python. We're building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 1T datasets.

This talk is about the platform, coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 100 server clusters with many many TB of ram and hundreds of CPUs.

Cliff Click, 0xdata CTO

Cliff Click is the CTO and Co-Founder of 0xdata, makers of H2O - the fast scalable open-source in-memory machine learning platform for Big Data. Cliff wrote his first compiler when he was 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). He helped Azul Systems build an 864 core pure-Java mainframe that keeps GC pauses on 500Gb heaps to under 10ms, and worked on all aspects of that JVM. Before that Cliff worked on HotSpot at Sun Microsystems, and is at least partially responsible for bringing Java into the mainstream.

Cliff is invited to speak regularly at industry and academic conferences and has published many papers about HotSpot technology. He holds a PhD in Computer Science from Rice University and about 15 patents.