[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

ASF GitHub Bot (Jira) Thu, 20 May 2021 01:48:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25142?focusedWorklogId=599679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599679
 ]


ASF GitHub Bot logged work on HIVE-25142:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/May/21 08:47
            Start Date: 20/May/21 08:47
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on a change in pull request #2300:
URL: https://github.com/apache/hive/pull/2300#discussion_r635901296



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashKeyRef.java
##########
@@ -69,12 +69,14 @@ public static int calculateHashCode(long refWord, 
WriteBuffers writeBuffers,
 
       // And, if current value is big we must read it.
       actualKeyLength = writeBuffers.readVInt(readPos);
-      keyAbsoluteOffset = absoluteOffset + actualKeyLength;
+
+      // Now the read position is set to start of the key as readVInt moved the
+      // position by size of key length.
+      return writeBuffers.hashCode(actualKeyLength, readPos);

Review comment:
       I guess the fact that we have perform a read to get the actual KeyLen is 
making things more complex here. 
   Shall we add some comments on them method level for future ref?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 599679)
    Time Spent: 0.5h  (was: 20m)

> Rehashing in map join fast hash table  causing corruption for large keys
> ------------------------------------------------------------------------
>
>                 Key: HIVE-25142
>                 URL: https://issues.apache.org/jira/browse/HIVE-25142
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

Reply via email to