Friso van Vollenhoven, TweetCTO of GoDataDriven

Biography: Friso van Vollenhoven

Friso is CTO of GoDataDriven. With a background in software engineering, he is currently active in the area that overlaps both systems and software engineering and applied, large scale, data processing. He is a long time Hadoop user, track chair of the Hadoop track at the GOTO conference in Amsterdam and also organiser of The Amsterdam Applied Machine Learning meetup group and the Dutch Hadoop User Group.

Twitter: @fzk

Presentation: TweetBig Data in the Browser - Live Coding Terabytes in a Notebook

Track: Hadoop / Time: Friday 13:20 - 14:10 / Location: Veilingzaal

The web browser has become the de facto user interface for pretty much anything that isn't consumed on a smart phone. Yet, as software engineers we are more inclined to work with text editors and operating system shells, because these allow us to write code instead of clicking on things. Here's news for you: you can do that in the browser too! Plus, the browser can directly show interactive visualizations, neatly layed out output and include code comments with markup. Put all of this on top of Hadoop and you can interactively code against big data sets to extract information and create visualisations on the fly.

In this talk and live coding session, we'll show you how to use the popular browser based notebook environment Jupyter on top of Hadoop/Spark to process large data sets with Scala or Python for data visualisations and interactive reports. Examples include performing web analytics on click stream data and mining Stackoverflow data for trending topics.

Workshop: Data Science on Hadoop Tweet

Track: Workshops / Time: Wednesday 09:00 - 16:00 / Location: Rode Kamer

In this full day workshop on Data Science using Apache Hadoop, you will learn how to work with large data sets and extract meaningful information from them as well as applying machine learning models to build data driven functionality. You will work on a real world, substantially large data set on a full blown Hadoop cluster (running in the cloud).

We will start off with an introduction of the activities of a data scientist and some of the concepts that are involved. During the first part we will get hands-on with exploratory data analysis on a large data set using Apache Hadoop, Apache Spark and Python. In the second part we will create a full blown data science solution using a large data set and machine learning models.

This workshop focusses on getting hands-on with these subjects and not too much on theory.

Learning outcomes:

Understand the Data Science process
Basic use of some Data Science tools for Big (and smaller) Data
Basic use of Apache Hadoop and Apache Spark
Data visualisation for exploratory analysis
Basic knowledge of machine learning models

Target Audience
Software engineers who want to get hands-on with data science. Coding skills are required. No prior knowledge of data science or machine learning is expected. Some experience in Python is helpful, but not a necessity.

Technical Requirements
You need a laptop that allows SSH access to a server and has a web browser. Additionally, a text editor can come in handy.

Workshop is limited to 20 attendees.