Hi all,
When I worked on ARROW-7539[1], I met some problems and not sure what's the proper way to solve it. This issue was about to avoid set reader/writer indices in FieldVector#getFieldBuffers according to the following reasons: i. getBuffers set reader/writer indices and it's right for the purpose of sending the data over the wire ii. getFieldBuffers is distinct from getBuffers, it should be for getting access to underlying data for higher-performance algorithms Currently in VectorUnloader, we used getFieldBuffers to create ArrowRecordBatch that's why we keep writer/reader indices in getFieldBuffers , we should use getBuffers instead. But during the change, we found another problem: The order of validity and offset buffers are not in the same order in ListVector(getBuffers's order is right), changing the API in VectorUnloader creates problems with serialization/deserialization resulting in test failures in Dremio which would break backward compatibility with existing serialised files. Micah gives a solution but seems doesn't reach consistent in the PR thread[2 ]: 1. Remove setReaderWriterIndeces in getFieldBuffers 2. Deprecate getBuffers 3. Introduce a new getIpcBuffers which is unambiguously used for writing record batches (i.e. in VectorUnloader). 4. Update documentation where it makes sense based on all this conversation. More details and discussions can be seen from the PR and hope to get more feedback. Thanks, Ji Liu [1] https://issues.apache.org/jira/browse/ARROW-7539 [2] https://github.com/apache/arrow/pull/6156