GOTO is a vendor independent international software development conference with more that 90 top speaker and 1300 attendees. The conference cover topics such as .Net, Java, Open Source, Agile, Architecture and Design, Web, Cloud, New Languages and Processes

Siddharth Anand, Data Architect at Agari Inc

Siddharth Anand

Biography: Siddharth Anand

Siddharth “Sid" Anand is a hands-on software architect with deep experience building and scaling web sites that millions of people visit every day. He currently serves as the Data Architect for Agari, a rising email security company. Prior to joining Agari, Sid held several technical and leadership positions including LinkedIn’s Search Architect, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. He has over 15 years of experience in building websites that millions of people visit every day. He also provides start-ups with technical advisory services in the areas of big data, scalability, availability and performance. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. When not working, Sid enjoys spending time with his lovely wife and his son.

Twitter: @r39132

Presentation: Resilient Predictive Data Pipelines

Time: Wednesday 17:10 - 18:00 / Location: Grand Ballroom A & B

Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, and uptime, they need to closely manage their core technology. Like serving systems (e.g. web application servers and OLTP databases) that need to be up 24x7 to display content to users, data pipelines need to be up and running in order to pick the most engaging and up-to-date content to display. In other words, updated ranking models, new content recommendations, and the like are what make data pipelines an integral part of an end user’s web experience. We call these predictive data pipelines since their output is source data marked with ranking or classification data. At the heart of these systems lies Airflow, Airbnb’s thriving open-source workflow scheduler. Come to this talk to learn how Agari leverages Airflow and other best practices from both Cloud (AWS SNS, SQS, Kinesis, Auto-Scaling, S3, Lambda, etc...) and Big Data (Spark, Airflow, Avro, etc...) to build its fault-tolerant predictive data pipeline.