Presentation: Tweet"Big Data in the Browser - Live Coding Terabytes in a Notebook"

Track: Hadoop / Time: Friday 13:20 - 14:10 / Location: Veilingzaal

The web browser has become the de facto user interface for pretty much anything that isn't consumed on a smart phone. Yet, as software engineers we are more inclined to work with text editors and operating system shells, because these allow us to write code instead of clicking on things. Here's news for you: you can do that in the browser too! Plus, the browser can directly show interactive visualizations, neatly layed out output and include code comments with markup. Put all of this on top of Hadoop and you can interactively code against big data sets to extract information and create visualisations on the fly.

In this talk and live coding session, we'll show you how to use the popular browser based notebook environment Jupyter on top of Hadoop/Spark to process large data sets with Scala or Python for data visualisations and interactive reports. Examples include performing web analytics on click stream data and mining Stackoverflow data for trending topics.

Download slides

Andrew Snare, TweetBig Data Hacker at GoDataDriven

Biography: Andrew Snare

Andrew a Big Data Hacker at GoDataDriven in Amsterdam where he works on scalable, data intensive systems for, amongst others, search, web analytics and online personlisation. Having started programming in AppleSoft on a homemade Apple ][, he's spent more than 15 years behind the keyboard and has the scars to prove it. Andrew is coder at heart and still enjoys those eureka moments when elegant solutions present themselves. He is also a Cloudera / Datastax certified trainer for the Apache Spark and Apache Cassandra technologies.

Twitter: @asnare

Friso van Vollenhoven, TweetCTO of GoDataDriven

Biography: Friso van Vollenhoven

Friso is CTO of GoDataDriven. With a background in software engineering, he is currently active in the area that overlaps both systems and software engineering and applied, large scale, data processing. He is a long time Hadoop user, track chair of the Hadoop track at the GOTO conference in Amsterdam and also organiser of The Amsterdam Applied Machine Learning meetup group and the Dutch Hadoop User Group.

Twitter: @fzk