Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

Micah Kornfield Fri, 07 Aug 2020 20:51:09 -0700

 Thinking about this some more, I think maybe we should also potentially
try to deprecate hold off on any changes to getFieldBuffers.  It should
likely follow the same sort of pattern for getBuffers (create a new method
that doesn't set reader/writer indices and deprecate getFieldBuffers).  But
I think that can be handled in a separate PR?


Anybody else have thoughts?

-Micah

On Tue, Aug 4, 2020 at 11:24 PM Ji Liu <[email protected]> wrote:

> hi liya,
> Thanks for your careful review, it is a typo, the order of getBuffers is
> wrong.
>
> Fan Liya <[email protected]> 于2020年8月5日周三 下午2:14写道：
>
> > Hi Ji,
> >
> > IMO, for the correct order, the validity buffer should precede the offset
> > buffer (e.g. this is the order used by BaseVariableWidthVector &
> > BaseLargeVariableWidthVector).
> > In ListVector#getBuffers, the offset buffer precedes the validity buffer,
> > so I am a little confused why you say the order of ListVector#getBuffers
> is
> > right?
> >
> > Best,
> > Liya Fan
> >
> > On Wed, Aug 5, 2020 at 12:32 PM Micah Kornfield <[email protected]>
> > wrote:
> >
> > > FWIW, I lack historical context on how these methods evolved, so I'd
> > > appreciate insight from anyone who has worked on the java codebase for
> a
> > > longer period of time.  The current situation seems less then ideal.
> > >
> > > On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > When I worked on ARROW-7539[1], I met some problems and not sure
> what's
> > > the
> > > > proper way to solve it.
> > > >
> > > >
> > > > This issue was about to avoid set reader/writer indices in
> > > > FieldVector#getFieldBuffers according to the following reasons:
> > > >
> > > > i. getBuffers set reader/writer indices and it's right for the
> purpose
> > of
> > > > sending the data over the wire
> > > >
> > > > ii. getFieldBuffers is distinct from getBuffers, it should be for
> > getting
> > > > access to underlying data for higher-performance algorithms
> > > >
> > > >
> > > > Currently in VectorUnloader, we used getFieldBuffers to create
> > > > ArrowRecordBatch that's why we keep writer/reader indices in
> > > > getFieldBuffers
> > > > , we should use getBuffers instead. But during the change, we found
> > > another
> > > > problem:
> > > >
> > > > The order of validity and offset buffers are not in the same order in
> > > > ListVector(getBuffers's order is right), changing the API in
> > > VectorUnloader
> > > > creates problems with serialization/deserialization resulting in test
> > > > failures in Dremio which would break backward compatibility with
> > existing
> > > > serialised files.
> > > >
> > > >
> > > > Micah gives a solution but seems doesn't reach consistent in the PR
> > > > thread[2
> > > > ]:
> > > >
> > > >    1. Remove setReaderWriterIndeces in getFieldBuffers
> > > >    2. Deprecate getBuffers
> > > >    3. Introduce a new getIpcBuffers which is unambiguously used for
> > > writing
> > > >    record batches (i.e. in VectorUnloader).
> > > >    4. Update documentation where it makes sense based on all this
> > > >    conversation.
> > > >
> > > >
> > > > More details and discussions can be seen from the PR and hope to get
> > more
> > > > feedback.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Ji Liu
> > > >
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/ARROW-7539
> > > >
> > > > [2] https://github.com/apache/arrow/pull/6156
> > > >
> > >
> >
>

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

Reply via email to