[ https://issues.apache.org/jira/browse/HIVE-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denys Kuzmenko updated HIVE-27519: ---------------------------------- Priority: Major (was: Critical) > Inifite array growth when optimized hashtable size is set to 0 > -------------------------------------------------------------- > > Key: HIVE-27519 > URL: https://issues.apache.org/jira/browse/HIVE-27519 > Project: Hive > Issue Type: Bug > Reporter: ConfX > Priority: Major > Attachments: reproduce.sh > > > h2. What happened: > When set optimized hashtable size to 0 by > {{{}hive.mapjoin.optimized.hashtable.wbsize == 0{}}}, there is an infinite > array growth in {{WriteBuffers.java#nextBufferToWrite()}} and crashes the > system unexpectedly. > h2. Buggy code: > {noformat} > private void nextBufferToWrite() { > if (writePos.bufferIndex == (writeBuffers.size() - 1)) { > if ((1 + writeBuffers.size()) * ((long)wbSize) > maxSize) { // <--- > always false because wbSize is 0 > throw new RuntimeException("Too much memory used by write buffers"); > } > writeBuffers.add(new byte[wbSize]); // <---- wbSize is 0 here > } > ++writePos.bufferIndex; > writePos.buffer = writeBuffers.get(writePos.bufferIndex); > writePos.offset = 0; > }{noformat} > When setting the optimized hashtable size to 0, the variable {{wbSize}} here > equals to 0. So in this case, writeBuffers.add() method keeps adding > zero-length byte array, the if statement {{if (writePos.bufferIndex == > (writeBuffers.size() - 1)) }} is always true because {{writePos.bufferIndex}} > is increased by one each time. The size of the {{writeBuffers}} is also > increased by one each time. Also, the {{if ((1 + writeBuffers.size()) * > ((long)wbSize) > maxSize)}} never becomes true because {{wbSize}} is 0 and > the RuntimeException inside will not be thrown. This makes the method keep > adding zero-length byte array to {{{}writeBuffers{}}}, causing OOM and crash > the system. > h2. How to reproduce: > (1) Set {{hive.mapjoin.optimized.hashtable.wbsize}} to 0 > (2) Run test > {{org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator#testMultiKey2}} > For an easy reproduction, run the {{reproduce.sh}} in the attachment. > h2. StackTrace: > {noformat} > java.lang.OutOfMemoryError: Java heap space > at java.base/java.util.Arrays.copyOf(Arrays.java:3689) > > > at java.base/java.util.ArrayList.grow(ArrayList.java:238) > > > at java.base/java.util.ArrayList.grow(ArrayList.java:243) > at java.base/java.util.ArrayList.add(ArrayList.java:486) > at java.base/java.util.ArrayList.add(ArrayList.java:499) > > at > org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) > at > org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) > at > org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:222) > > > at > org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:424) > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:461) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.loadTableContainerData(MapJoinTestConfig.java:794) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoin(MapJoinTestConfig.java:846) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:997) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:971) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestImplementation(TestMapJoinOperator.java:1968) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeRowModeOptimized(TestMapJoinOperator.java:1906) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doExecuteTest(TestMapJoinOperator.java:1859) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestInner(TestMapJoinOperator.java:1807) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTest(TestMapJoinOperator.java:1783) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doTestMultiKey2(TestMapJoinOperator.java:1144) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.testMultiKey2(TestMapJoinOperator.java:1076){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010)