Big Data looks tiny from Stratosphere

05/26/2014 - 14:10 to 14:50
Frannz Club
long talk (40 min)

Session abstract: 

Stratosphere ( is a next-generation Apache licensed platform for Big Data Analysis. Stratosphere combines the flexibility and scalability of MapReduce-like systems with a high-performance runtime and automatic optimization technology inspired by MPP databases. Stratosphere offers fluent APIs in Java and Scala that extend the MapReduce model with arbitrarily long programs and more operators such as join, cogroup, cross, and iterate. Stratosphere’s runtime uses main memory efficiently, and gradually degrades to disk with good performance under memory pressure. Stratosphere’ cost-based optimizer automatically picks the best execution strategy for programs taking into account data and hardware characteristics. Finally, Stratosphere features end-to-end first class support for iterative programs, achieving similar performance to Giraph while still being a general (not graph-specific) system. Stratosphere is compatible with the Hadoop ecosystem, runs on top of YARN, and can use HDFS for data storage. Stratosphere is developed by a growing developer community, and is currently witnessing its first commercial installations and use cases.