Thinking about this some more, I think maybe we should also potentially try to deprecate hold off on any changes to getFieldBuffers. It should likely follow the same sort of pattern for getBuffers (create a new method that doesn't set reader/writer indices and deprecate getFieldBuffers). But I think that can be handled in a separate PR?
Anybody else have thoughts? -Micah On Tue, Aug 4, 2020 at 11:24 PM Ji Liu <tianc...@apache.org> wrote: > hi liya, > Thanks for your careful review, it is a typo, the order of getBuffers is > wrong. > > Fan Liya <liya.fa...@gmail.com> 于2020年8月5日周三 下午2:14写道: > > > Hi Ji, > > > > IMO, for the correct order, the validity buffer should precede the offset > > buffer (e.g. this is the order used by BaseVariableWidthVector & > > BaseLargeVariableWidthVector). > > In ListVector#getBuffers, the offset buffer precedes the validity buffer, > > so I am a little confused why you say the order of ListVector#getBuffers > is > > right? > > > > Best, > > Liya Fan > > > > On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > FWIW, I lack historical context on how these methods evolved, so I'd > > > appreciate insight from anyone who has worked on the java codebase for > a > > > longer period of time. The current situation seems less then ideal. > > > > > > On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote: > > > > > > > Hi all, > > > > > > > > > > > > When I worked on ARROW-7539[1], I met some problems and not sure > what's > > > the > > > > proper way to solve it. > > > > > > > > > > > > This issue was about to avoid set reader/writer indices in > > > > FieldVector#getFieldBuffers according to the following reasons: > > > > > > > > i. getBuffers set reader/writer indices and it's right for the > purpose > > of > > > > sending the data over the wire > > > > > > > > ii. getFieldBuffers is distinct from getBuffers, it should be for > > getting > > > > access to underlying data for higher-performance algorithms > > > > > > > > > > > > Currently in VectorUnloader, we used getFieldBuffers to create > > > > ArrowRecordBatch that's why we keep writer/reader indices in > > > > getFieldBuffers > > > > , we should use getBuffers instead. But during the change, we found > > > another > > > > problem: > > > > > > > > The order of validity and offset buffers are not in the same order in > > > > ListVector(getBuffers's order is right), changing the API in > > > VectorUnloader > > > > creates problems with serialization/deserialization resulting in test > > > > failures in Dremio which would break backward compatibility with > > existing > > > > serialised files. > > > > > > > > > > > > Micah gives a solution but seems doesn't reach consistent in the PR > > > > thread[2 > > > > ]: > > > > > > > > 1. Remove setReaderWriterIndeces in getFieldBuffers > > > > 2. Deprecate getBuffers > > > > 3. Introduce a new getIpcBuffers which is unambiguously used for > > > writing > > > > record batches (i.e. in VectorUnloader). > > > > 4. Update documentation where it makes sense based on all this > > > > conversation. > > > > > > > > > > > > More details and discussions can be seen from the PR and hope to get > > more > > > > feedback. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Ji Liu > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-7539 > > > > > > > > [2] https://github.com/apache/arrow/pull/6156 > > > > > > > > > >