GOTO is a vendor independent international software development conference with more that 90 top speaker and 1300 attendees. The conference cover topics such as .Net, Java, Open Source, Agile, Architecture and Design, Web, Cloud, New Languages and Processes
Paco Nathan, TweetDatabricks
Biography: Paco Nathan
Paco Nathan, is a "player/coach" who has led innovative Data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, functional programming, cloud computing. Paco is an O'Reilly author, Apache Spark open source evangelist with Databricks, and an advisor for Amplify Partners and GalvanizeU. He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Twitter: @pacoid
Workshop: Intro to Apache Spark Tweet
The Introduction to Apache Spark tutorial is for developers who already work in Python, Java, Scala to learn to use core Spark APIs. This workshop features hands-on technical exercises to get up to speed using Spark for data exploration, analysis, and building Big Data applications.
Topics covered include:
- creating a notebook or installing Spark locally
- pre-flight check: initial Spark coding exercise
- Spark Deconstructed: RDDs, lazy-eval, and what happens on a cluster
- A Brief History: motivations for Spark and its context in Big Data
- progressive coding exercises: WC, Join, Workflow
- Spark Essentials: context, driver, transformations, actions, persistence, etc.
- combining SQL, Streaming, Machine Learning, and Graph for Unified Pipelines
- review/analysis of case studies for production deployments of Spark in industry
- further resources for learning about Spark, dev community, prep for certification exam, etc.
Prerequisites:
- some programming experience in Python, Java, or Scala
- some familiarity with Big Data use cases and issues
- laptop with wifi + browser