Real-time Map Reduce: Exploring Clickstream Analytics with Spark Streaming, Kafka and WebSockets

Scale
05/27/2014 - 17:20 to 18:00
Frannz Club
long talk (40 min)

Session abstract: 

Spark Streaming is an extension to Apache Spark that lets users seamlessly intermix streaming, batch and interactive queries through the use of a new programming model. Coupling this with strong consistency and efficient fault recovery, the opportunities to build robust streaming analytics systems is limited only by imagination. In this talk I'll show you how to practically use Apache Kafka to store clickstream data, Apache Spark Streaming to perform click stream analysis, and WebSockets to stream the results out to a client. I will do my best to whet the appetite for what is possible with Apache Spark Streaming into a 45min-60 min talk.

Video: 

Slide: