Hi all,

When I worked on ARROW-7539[1], I met some problems and not sure what's the
proper way to solve it.


This issue was about to avoid set reader/writer indices in
FieldVector#getFieldBuffers according to the following reasons:

i. getBuffers set reader/writer indices and it's right for the purpose of
sending the data over the wire

ii. getFieldBuffers is distinct from getBuffers, it should be for getting
access to underlying data for higher-performance algorithms


Currently in VectorUnloader, we used getFieldBuffers to create
ArrowRecordBatch that's why we keep writer/reader indices in getFieldBuffers
, we should use getBuffers instead. But during the change, we found another
problem:

The order of validity and offset buffers are not in the same order in
ListVector(getBuffers's order is right), changing the API in VectorUnloader
creates problems with serialization/deserialization resulting in test
failures in Dremio which would break backward compatibility with existing
serialised files.


Micah gives a solution but seems doesn't reach consistent in the PR thread[2
]:

   1. Remove setReaderWriterIndeces in getFieldBuffers
   2. Deprecate getBuffers
   3. Introduce a new getIpcBuffers which is unambiguously used for writing
   record batches (i.e. in VectorUnloader).
   4. Update documentation where it makes sense based on all this
   conversation.


More details and discussions can be seen from the PR and hope to get more
feedback.



Thanks,

Ji Liu



[1] https://issues.apache.org/jira/browse/ARROW-7539

[2] https://github.com/apache/arrow/pull/6156

Reply via email to