Another key piece of information[1] provided by Jorge on the the ticket is
that there is an (older) IPC test case file that has empty offset buffers
for an array of zero length, which is why this issue came up.

I think we have closed this issue now to our satisfaction. Thank you all
for the comments

Andrew

[1] https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121011247

On Mon, May 9, 2022 at 11:29 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I think the behavior is undefined.  For an empty string array the offsets
> buffer generally shouldn't be referenced.
>
> On Mon, May 9, 2022 at 10:27 AM Sasha Krassovsky <
> krassovskysa...@gmail.com>
> wrote:
>
> > Hello,
> > I think an empty string array will have an offsets buffer of length 1
> with
> > the value 0.
> >
> > Sasha Krassovsky
> >
> > > 9 мая 2022 г., в 05:23, Yang hao <1371656737...@gmail.com> написал(а):
> > >
> > > For an empty (list, binary, string) array, what should the offsets
> > buffer be? Empty buffer or a buffer containing a single zero? Or both are
> > valid?
> > >
> > > There is some related information I found:
> > >
> > >  1.  In the Apache Arrow Format: link<
> >
> https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout
> > >
> > >
> > > The offsets buffer contains length + 1 signed integers (either 32-bit
> or
> > 64-bit, depending on the logical type).
> > > Generally the first slot in the offsets array is 0, and the last slot
> is
> > the length of the values array.
> > >
> > >  1.  A related issue in arrow-rs: link<
> > https://github.com/apache/arrow-rs/issues/1620>
> > >
> > > We find that some test data in arrow-testing has empty offsets buffer
> > (but not 100% sure).
> > >
> > >  1.  In arrow2 (rust), offsets buffer cannot be empty:
> > >
> > > Link1<
> >
> https://github.com/jorgecarleitao/arrow2/blob/8fb3b8d3f05cdc3d51f1314cfeb9bec39196789c/src/array/specification.rs#L101
> > >
> > >
> > > Link2<
> >
> https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/array/binary.rs#L45
> > >
> > >
> > >  1.  In arrow (c++), (sorry I am not familiar with the c++
> > implementation):
> > >
> > > Link<
> >
> https://github.com/apache/arrow/blob/c70426f73326b3852d1bd7c31d98be4743f3fcba/cpp/src/arrow/array/array_nested.cc#L111-L113
> > >
> > >
> > > Looking forward to your opintions!
> > >
> > >
> > > Regards,
> > > Remzi
> >
>

Reply via email to