[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787165#comment-13787165
 ] 

Lefty Leverenz commented on HIVE-4160:
--------------------------------------

Put it in Design Docs 
(https://cwiki.apache.org/confluence/display/Hive/DesignDocs) until it's 
released.  Later you can move it into the User Docs with a note about which 
release introduces it.  You can either change the file's location in the 
hierarchy or leave it in place and just link to it from the User Docs section.

When it goes into User Docs, you have some choices.  Does it belong on the Home 
page or in the Language Manual?  If in the Language Manual, do you want it 
under DML or should it be a stand-alone doc?  That depends on what you write 
and how you want readers to find the doc.  You can always add links from other 
docs to make sure people find it.

Here's the Language Manual:  
https://cwiki.apache.org/confluence/display/Hive/LanguageManual.

Of course configuration goes here, perhaps in a subsection under Query 
Execution:  
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties.  I 
suggest you make a section in your design doc that's formatted to match the 
configuration doc, so when the time comes you can just cut & paste.

> Vectorized Query Execution in Hive
> ----------------------------------
>
>                 Key: HIVE-4160
>                 URL: https://issues.apache.org/jira/browse/HIVE-4160
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.docx, 
> Hive-Vectorized-Query-Execution-Design-rev10.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev11.docx, 
> Hive-Vectorized-Query-Execution-Design-rev11.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to