[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786285#comment-13786285 ]
Eric Hanson commented on HIVE-4160: ----------------------------------- I've been planning to write some user documentation for this feature. Where do you think would be a good spot in the wiki to include it? > Vectorized Query Execution in Hive > ---------------------------------- > > Key: HIVE-4160 > URL: https://issues.apache.org/jira/browse/HIVE-4160 > Project: Hive > Issue Type: New Feature > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > Attachments: Hive-Vectorized-Query-Execution-Design.docx, > Hive-Vectorized-Query-Execution-Design-rev10.docx, > Hive-Vectorized-Query-Execution-Design-rev10.docx, > Hive-Vectorized-Query-Execution-Design-rev10.pdf, > Hive-Vectorized-Query-Execution-Design-rev11.docx, > Hive-Vectorized-Query-Execution-Design-rev11.pdf, > Hive-Vectorized-Query-Execution-Design-rev2.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.pdf, > Hive-Vectorized-Query-Execution-Design-rev4.docx, > Hive-Vectorized-Query-Execution-Design-rev4.pdf, > Hive-Vectorized-Query-Execution-Design-rev5.docx, > Hive-Vectorized-Query-Execution-Design-rev5.pdf, > Hive-Vectorized-Query-Execution-Design-rev6.docx, > Hive-Vectorized-Query-Execution-Design-rev6.pdf, > Hive-Vectorized-Query-Execution-Design-rev7.docx, > Hive-Vectorized-Query-Execution-Design-rev8.docx, > Hive-Vectorized-Query-Execution-Design-rev8.pdf, > Hive-Vectorized-Query-Execution-Design-rev9.docx, > Hive-Vectorized-Query-Execution-Design-rev9.pdf > > > The Hive query execution engine currently processes one row at a time. A > single row of data goes through all the operators before the next row can be > processed. This mode of processing is very inefficient in terms of CPU usage. > Research has demonstrated that this yields very low instructions per cycle > [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization > and data columns go through a layer of object inspectors that identify column > type, deserialize data and determine appropriate expression routines in the > inner loop. These layers of virtual method calls further slow down the > processing. > This work will add support for vectorized query execution to Hive, where, > instead of individual rows, batches of about a thousand rows at a time are > processed. Each column in the batch is represented as a vector of a primitive > data type. The inner loop of execution scans these vectors very fast, > avoiding method calls, deserialization, unnecessary if-then-else, etc. This > substantially reduces CPU time used, and gives excellent instructions per > cycle (i.e. improved processor pipeline utilization). See the attached design > specification for more details. -- This message was sent by Atlassian JIRA (v6.1#6144)