Presentation: "Solving classical data analytic task by using modern distributed databases"

Track: The State of Data / Time: Tuesday 11:30 - 12:20 / Location: Grandball

NoSQL databases have a limited query languages that are not suitable for analytical request. The classical solution provided by most of them is a Hadoop integration. That is not fast.  Thus a number of fast distributed, parallel query/computation engines appears recently to fix Hadoop  performance problems.
The presentation will show how to solve classical data analytic task by using modern distributed databases and in-memory engines using as example Spark and Cassandra. It will cover following topics:
  • Apache Spark benefits, architecture and Scala API. (Don't be afraid of Scala, we are here to help you)
  • Load and store data from Cassandra NoSQL database
  • Data enrichments and joins
  • Spark Machine learning and graph algorithms
Target audience: Software engineers and solution architects using or planning to use NoSql products for analytics, particularly Cassandra and Spark.

Artem Aliev, Software Engineer at DataStax

Artem Aliev is a software developer in the DataStax Enterprise Analytics team. He works on integrating Apache Cassandra noSQL database with analytics solution like Spark and Hive. efore that he works as Big Data Solution Architect, Developer of Apache Harmony J2SE implementation and as a lead of performance optimisation team for  enterprise storage software at EMC corporation. o he can talk about the big data processing pipeline: from data on disks to machine learning and visualisation.

Twitter: @__ali