[ https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shubham Chaurasia reassigned HIVE-23034: ---------------------------------------- > Arrow serializer should not keep the reference of arrow offset and validity > buffers > ----------------------------------------------------------------------------------- > > Key: HIVE-23034 > URL: https://issues.apache.org/jira/browse/HIVE-23034 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers > Reporter: Shubham Chaurasia > Assignee: Shubham Chaurasia > Priority: Major > > Currently, a part of writeList() method in arrow serializer is implemented > like - > {code:java} > final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer(); > int nextOffset = 0; > for (int rowIndex = 0; rowIndex < size; rowIndex++) { > int selectedIndex = rowIndex; > if (vectorizedRowBatch.selectedInUse) { > selectedIndex = vectorizedRowBatch.selected[rowIndex]; > } > if (hiveVector.isNull[selectedIndex]) { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > } else { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > nextOffset += (int) hiveVector.lengths[selectedIndex]; > arrowVector.setNotNull(rowIndex); > } > } > offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset); > {code} > 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = > arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and > offset vector. > Problem - > {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates > the offset and validity buffers when a threshold is crossed, updates the > references internally and also releases the old buffers (which decrements the > buffer reference count). Now the reference which we obtained in 1) becomes > obsolete. Furthermore if try to read or write old buffer, we see - > {code:java} > Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 > at > io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413) > at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131) > at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162) > at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205) > {code} > > Solution - > This can be fixed by getting the buffers each time ( > {{arrowVector.getOffsetBuffer()}} ) we want to update them. > In our internal tests, this is very frequently seen on arrow 0.8.0 but not on > 0.10.0 but should be handled the same way for 0.10.0 too as it does the same > thing. -- This message was sent by Atlassian Jira (v8.3.4#803005)