In this talk we’ll explore Apache Spark – the most popular cluster computing framework right now. We’ll look at the improvements that Spark brought over Hadoop MapReduce and what makes Spark so fast; explore Spark programming model and RDDs; and look at some sample use cases for Spark and big data in general.
This talk will be interesting for people who have little or no experience with Spark and would like to learn more about it. It will also be interesting to a general engineering audience as we’ll go over the Spark programming model and some engineering tricks that make Spark fast.
I graduated from University of Waterloo, where I did a lot of competitive programming (reached red on TopCoder and ACM ICPC finals) and interned at Facebook and LinkedIn on projects in big data and ML areas. I currently work as a Software Engineer at Databricks, the company founded by creators of Apache Spark.
Company: «Databricks», Netherlands