[ https://issues.apache.org/jira/browse/HIVE-21531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824551#comment-16824551 ]
Ashutosh Chauhan commented on HIVE-21531: ----------------------------------------- +1 > Vectorization: all NULL hashcodes are not computed using Murmur3 > ---------------------------------------------------------------- > > Key: HIVE-21531 > URL: https://issues.apache.org/jira/browse/HIVE-21531 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0, 3.1.1 > Reporter: Gopal V > Assignee: Gopal V > Priority: Critical > Attachments: HIVE-21531.1.patch, HIVE-21531.1.patch, > HIVE-21531.WIP.patch > > > The comments in Vectorized hash computation call out the MurmurHash > implementation (the one using 0x5bd1e995), while the non-vectorized codepath > calls out the Murmur3 one (using 0xcc9e2d51). > The comments here are wrong > {code} > /** > * Batch compute the hash codes for all the serialized keys. > * > * NOTE: MAJOR MAJOR ASSUMPTION: > * We assume that HashCodeUtil.murmurHash produces the same result > * as MurmurHash.hash with seed = 0 (the method used by > ReduceSinkOperator for > * UNIFORM distribution). > */ > protected void computeSerializedHashCodes() { > int offset = 0; > int keyLength; > byte[] bytes = output.getData(); > for (int i = 0; i < nonNullKeyCount; i++) { > keyLength = serializedKeyLengths[i]; > hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0); > offset += keyLength; > } > } > {code} > but the wrong comment is followed in the Vector RS operator > {code} > System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, > nullBytesLength); > nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, > nullBytesLength); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)