[ 
https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501804#comment-14501804
 ] 

Matt McCline commented on HIVE-9824:
------------------------------------

Unfortunately, the nature of vectorization is the moment you try and abstract 
and encapsulate to reduce duplication you heavily impact performance.  Each of 
the cases needs to be expanded out for good performance.

Each of the vector join algorithms now has a match phase that collects equal 
key series and remembers the small table information.  Then, a finish phase 
that outputs the join results.  So, actually the code is a lot cleaner that it 
use to be when those phases were wound together :)

I just coded making the join algorithms use the string templates in 
GenVectorCode.  It is by far the goriest one I've seen.  I'm not sure it is an 
improvement, so I'm holding it back...

> LLAP: Native Vectorization of Map Join so previously CPU bound queries shift 
> their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9824
>                 URL: https://issues.apache.org/jira/browse/HIVE-9824
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, 
> HIVE-9824.04.patch
>
>
> Today's VectorMapJoinOperator is a pass-through that converts each row from a 
> vectorized row batch in a Java Object[] row and passes it to the 
> MapJoinOperator superclass.
> This enhancement creates specialized vectorized map join operator classes 
> that are optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to