Shubham Chaurasia created HIVE-23034:
----------------------------------------

             Summary: Arrow serializer should not keep the reference of arrow 
offset and validity buffers
                 Key: HIVE-23034
                 URL: https://issues.apache.org/jira/browse/HIVE-23034
             Project: Hive
          Issue Type: Bug
          Components: llap, Serializers/Deserializers
            Reporter: Shubham Chaurasia
            Assignee: Shubham Chaurasia


Currently, a part of writeList() method in arrow serializer is implemented like 
- 
{code:java}
final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
    int nextOffset = 0;

    for (int rowIndex = 0; rowIndex < size; rowIndex++) {
      int selectedIndex = rowIndex;
      if (vectorizedRowBatch.selectedInUse) {
        selectedIndex = vectorizedRowBatch.selected[rowIndex];
      }
      if (hiveVector.isNull[selectedIndex]) {
        offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
      } else {
        offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
        nextOffset += (int) hiveVector.lengths[selectedIndex];
        arrowVector.setNotNull(rowIndex);
      }
    }
    offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
{code}

1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and offset 
vector. 

Problem - 

{{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
the offset and validity buffers when a threshold is crossed, updates the 
references internally and also releases the old buffers (which decrements the 
buffer reference count). Now the reference which we obtained in 1) becomes 
obsolete. Furthermore if try to read or write old buffer, we see - 
{code:java}
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
        at 
io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
        at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
        at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
        at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
        at 
org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
{code}
 
Solution - 
This can be fixed by getting the buffers each time ( 
{{arrowVector.getOffsetBuffer()}} ) we want to update them. 

In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to