Vectorized Query Execution in Apache Hive

Scale

05/26/2014 - 17:30 to 18:10

Maschinenhaus

short talk (20 min)

Session abstract:

Hive's current query execution engine processes one row at a time and uses layers of virtual method calls in the inner loop. This mode of execution is very inefficient for cpu instruction pipelines, superscalar parallelism and L1 cache behavior. Vectorized query execution reads batch of rows as column vectors and each operator processes the whole vectors at a time. This vector mode of execution has been proven to be an order of magnitude faster for cpu performance. In hive we also gain manifolds improvements by removing the layers of branching and virtual method calls in the inner loop. In this talk we present this work including the architecture and algorithms and also present the results that have been acheived.

Video:

Jitendra Pandey at #bbuzz 2014

00:00

Slide:

jitendra_pandey_-_vectorizationpresentation_berlin.pdf

Berlin Buzzwords 2014

Vectorized Query Execution in Apache Hive

Session abstract:

Video:

Jitendra Pandey at #bbuzz 2014

Slide:

jitendra_pandey_-_vectorizationpresentation_berlin.pdf

Gold-Partner

Silver-Partner

Bronze-Partner

Startup-Partner

Barcamp-Sponsor

#bbuzz Party-Sponsor

Club Mate-Sponsor

Event-Partner

Media-Partner

Travel-Partner

Past conferences