Harnessing the power of YARN with Apache Twill

05/26/2014 - 16:40 to 17:20
Frannz Club
long talk (40 min)

Session abstract: 

When Apache Hadoop was first introduced to the Open Source, it was focused on implementing Google's Map/Reduce, a framework for batch processing of very large files in a distributed system. Built for running on large cluster of commodity hardware, Hadoop also included a cluster resource manager to divide the capacity of the cluster between the various Map/Reduce jobs that can run at a given time.  A Hadoop cluster, however, is not always fully utilized, and idle resources would best be used for other compute-intensive tasks like real-time stream processing, message passing, or graph algorithms. Unfortunately, the cluster resource manager was specialized in Map/Reduce execution and did not allow other types of workloads.  This situation changes with Apache Hadoop 2.0 and its resource manager, YARN, which is decoupled from the Map/Reduce execution engine. It allows running arbitrary workloads on a cluster, as long as they are built against YARN's application manager interface: YARN manages the cluster's resources as a set of "containers". Each application can obtain containers from YARN and is then free to use them for any type of computation. Hence different types of distributed applications can share a single cluster. This allows for more innovation, agility, and better hardware utilization.  However, YARN's power and flexibility come with complexity and this can make it challenging to get started with YARN. This can be especially difficult for application developers who are familiar with Java but do not have any experience with Hadoop. Apache Twill makes YARN more accessible to these developers, through an abstraction layer built over YARN that makes writing distributed applications as simple as programming with threads. Twill also has built-in support for real-time log collection, application lifecycle management and network service discovery, which greatly reduce the pain that developers face in developing, debugging, deploying and monitoring applications. The talk gives an introduction of Hadoop, YARN, and Twill, and it illustrates the use of YARN and Twill with programming examples.