[ 
https://issues.apache.org/jira/browse/HIVE-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27519:
----------------------------------
    Priority: Major  (was: Critical)

> Inifite array growth when optimized hashtable size is set to 0
> --------------------------------------------------------------
>
>                 Key: HIVE-27519
>                 URL: https://issues.apache.org/jira/browse/HIVE-27519
>             Project: Hive
>          Issue Type: Bug
>            Reporter: ConfX
>            Priority: Major
>         Attachments: reproduce.sh
>
>
> h2. What happened:
> When set optimized hashtable size to 0 by 
> {{{}hive.mapjoin.optimized.hashtable.wbsize == 0{}}}, there is an infinite 
> array growth in {{WriteBuffers.java#nextBufferToWrite()}} and crashes the 
> system unexpectedly.
> h2. Buggy code:
> {noformat}
>   private void nextBufferToWrite() {
>     if (writePos.bufferIndex == (writeBuffers.size() - 1)) {
>       if ((1 + writeBuffers.size()) * ((long)wbSize) > maxSize) {   // <--- 
> always false because wbSize is 0
>         throw new RuntimeException("Too much memory used by write buffers");
>       }
>       writeBuffers.add(new byte[wbSize]);  // <---- wbSize is 0 here
>     }
>     ++writePos.bufferIndex;
>     writePos.buffer = writeBuffers.get(writePos.bufferIndex);
>     writePos.offset = 0;
>   }{noformat}
> When setting the optimized hashtable size to 0, the variable {{wbSize}} here 
> equals to 0. So in this case, writeBuffers.add() method keeps adding 
> zero-length byte array, the if statement {{if (writePos.bufferIndex == 
> (writeBuffers.size() - 1)) }} is always true because {{writePos.bufferIndex}} 
> is increased by one each time. The size of the {{writeBuffers}} is also 
> increased by one each time. Also, the {{if ((1 + writeBuffers.size()) * 
> ((long)wbSize) > maxSize)}} never becomes true because {{wbSize}} is 0 and 
> the RuntimeException inside will not be thrown. This makes the method keep 
> adding zero-length byte array to {{{}writeBuffers{}}}, causing OOM and crash 
> the system.
> h2. How to reproduce:
> (1) Set {{hive.mapjoin.optimized.hashtable.wbsize}} to 0
> (2) Run test 
> {{org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator#testMultiKey2}}
> For an easy reproduction, run the {{reproduce.sh}} in the attachment.
> h2. StackTrace:
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>         at java.base/java.util.Arrays.copyOf(Arrays.java:3689)                
>                                                                               
>                                            
>         at java.base/java.util.ArrayList.grow(ArrayList.java:238)             
>                                                                               
>                                            
>         at java.base/java.util.ArrayList.grow(ArrayList.java:243)
>         at java.base/java.util.ArrayList.add(ArrayList.java:486)
>         at java.base/java.util.ArrayList.add(ArrayList.java:499)              
>      
>         at 
> org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
>         at 
> org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
>         at 
> org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:222)       
>                                                                               
>                                 
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:424)
>         at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:461)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.loadTableContainerData(MapJoinTestConfig.java:794)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoin(MapJoinTestConfig.java:846)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:997)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:971)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestImplementation(TestMapJoinOperator.java:1968)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeRowModeOptimized(TestMapJoinOperator.java:1906)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doExecuteTest(TestMapJoinOperator.java:1859)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestInner(TestMapJoinOperator.java:1807)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTest(TestMapJoinOperator.java:1783)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doTestMultiKey2(TestMapJoinOperator.java:1144)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.testMultiKey2(TestMapJoinOperator.java:1076){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to