Transactions and Abstractions for Apache HBase

05/26/2014 - 12:50 to 13:10
short talk (20 min)

Session abstract: 

While the demand for real-time, data-driven apps and insights continues to expand at a rapid pace, the tools and skills needed to develop Big Data applications have remained in the realm of a specialized few.  As a central component for most real-time applications on Hadoop, HBase is powerful and flexible but the APIs are low-level and expose the data model and complex distributed nature of the system.  This means building apps on HBase requires specialized knowledge of distributed systems, databases, and HBase itself if a developer wants to effectively use the system.  As a result, developers have a steep learning curve and end up implementing similar home-grown solutions to the same problems again and again, expending effort on infrastructure and middleware rather than the application itself.

In this talk, Jon will introduce a newly open-sourced (Apache 2.0) project that combines a distributed transaction engine with a data pattern abstraction layer to bring the power of application development on HBase to all Java developers.  This project includes implementations of many common and complex data patterns like TimeSeries, SecondardIndexes, various ORMs, real-time OLAP Cube, GeoSpatial indexes, SearchIndex, etc.  Through these data abstractions and the added flexibility and consistency guarantees provided by transactions, application developers are freed from dealing with the inner-workings and optimizations of HBase and can focus on the business logic around their data.

This project is being released as open source with the goal of increasing usability and adoption of HBase while driving and building a community around the development of a definitive library of data patterns and best practices implemented on HBase.

The talk will dive into architectural details of the transaction engine, examples of the most popular data pattern implementations, and an example production use case to bring it all together.  At the close, developers will be given details on how they can try it out for themselves and how to contribute to the growing library of data patterns.