[ https://issues.apache.org/jira/browse/HIVE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837074#comment-15837074 ]
Gopal V commented on HIVE-15573: -------------------------------- [~mmccline]: LGTM - +1 tests pending. Nits on the LOG.debug(), wrap the ones which do Arrays. calls with an isDebugEnabled. There needs to be a guard-rail to check the 2 enums together, in one place. Not all combinations of {{BucketNumKind}} x {{PartitionHashCodeKind PartitionHashCodeKind}} matrix are valid. Also final variables in the loop are very useful to catch issues ahead of time - moving these into the loop + finals, means the compiler ensures no left over state from a previous row & that all branches perform assignments to all variables. {code} + int batchIndex; + int bucketNum; + int hashCode; + int keyLength; {code} > Vectorization: ACID shuffle ReduceSink is not specialized > ---------------------------------------------------------- > > Key: HIVE-15573 > URL: https://issues.apache.org/jira/browse/HIVE-15573 > Project: Hive > Issue Type: Improvement > Components: Transactions, Vectorization > Affects Versions: 2.2.0 > Reporter: Gopal V > Assignee: Matt McCline > Fix For: 2.2.0 > > Attachments: HIVE-15573.01.patch, HIVE-15573.02.patch, > HIVE-15573.03.patch, screenshot-1.png > > > The ACID shuffle disabled murmur hash for the shuffle, due to the bucketing > requirements demanding the writable hashcode for the shuffles. > {code} > boolean useUniformHash = desc.getReducerTraits().contains(UNIFORM); > if (!useUniformHash) { > return false; > } > {code} > This check protects the fast ReduceSink ops from being used in ACID inserts. > A specialized case for the following pattern will make ACID insert much > faster. > {code} > Reduce Output Operator > sort order: > Map-reduce partition columns: _col0 (type: bigint) > value expressions: .... > {code} > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)