[jira] [Commented] (HIVE-15573) Vectorization: ACID shuffle ReduceSink is not specialized

Gopal V (JIRA) Tue, 24 Jan 2017 18:46:42 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837074#comment-15837074
 ]


Gopal V commented on HIVE-15573:
--------------------------------

[~mmccline]: LGTM -  +1 tests pending. 

Nits on the LOG.debug(), wrap the ones which do Arrays. calls with an 
isDebugEnabled.

There needs to be a guard-rail to check the 2 enums together, in one place. Not 
all combinations of {{BucketNumKind}} x {{PartitionHashCodeKind 
PartitionHashCodeKind}} matrix are valid.

Also final variables in the loop are very useful to catch issues ahead of time 
- moving these into the loop + finals, means the compiler ensures no left over 
state from a previous row & that all branches perform assignments to all 
variables.

{code}
+      int batchIndex;
+      int bucketNum;
+      int hashCode;
+      int keyLength;
{code}

> Vectorization: ACID shuffle ReduceSink is not specialized 
> ----------------------------------------------------------
>
>                 Key: HIVE-15573
>                 URL: https://issues.apache.org/jira/browse/HIVE-15573
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions, Vectorization
>    Affects Versions: 2.2.0
>            Reporter: Gopal V
>            Assignee: Matt McCline
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15573.01.patch, HIVE-15573.02.patch, 
> HIVE-15573.03.patch, screenshot-1.png
>
>
> The ACID shuffle disabled murmur hash for the shuffle, due to the bucketing 
> requirements demanding the writable hashcode for the shuffles.
> {code}
>     boolean useUniformHash = desc.getReducerTraits().contains(UNIFORM);
>     if (!useUniformHash) {
>       return false;
>     }
> {code}
> This check protects the fast ReduceSink ops from being used in ACID inserts.
> A specialized case for the following pattern will make ACID insert much 
> faster.
> {code}
>                     Reduce Output Operator
>                       sort order: 
>                       Map-reduce partition columns: _col0 (type: bigint)
>                       value expressions:  ....
> {code}
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15573) Vectorization: ACID shuffle ReduceSink is not specialized

Reply via email to