I think the behavior is undefined.  For an empty string array the offsets
buffer generally shouldn't be referenced.

On Mon, May 9, 2022 at 10:27 AM Sasha Krassovsky <krassovskysa...@gmail.com>
wrote:

> Hello,
> I think an empty string array will have an offsets buffer of length 1 with
> the value 0.
>
> Sasha Krassovsky
>
> > 9 мая 2022 г., в 05:23, Yang hao <1371656737...@gmail.com> написал(а):
> >
> > For an empty (list, binary, string) array, what should the offsets
> buffer be? Empty buffer or a buffer containing a single zero? Or both are
> valid?
> >
> > There is some related information I found:
> >
> >  1.  In the Apache Arrow Format: link<
> https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout
> >
> >
> > The offsets buffer contains length + 1 signed integers (either 32-bit or
> 64-bit, depending on the logical type).
> > Generally the first slot in the offsets array is 0, and the last slot is
> the length of the values array.
> >
> >  1.  A related issue in arrow-rs: link<
> https://github.com/apache/arrow-rs/issues/1620>
> >
> > We find that some test data in arrow-testing has empty offsets buffer
> (but not 100% sure).
> >
> >  1.  In arrow2 (rust), offsets buffer cannot be empty:
> >
> > Link1<
> https://github.com/jorgecarleitao/arrow2/blob/8fb3b8d3f05cdc3d51f1314cfeb9bec39196789c/src/array/specification.rs#L101
> >
> >
> > Link2<
> https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/array/binary.rs#L45
> >
> >
> >  1.  In arrow (c++), (sorry I am not familiar with the c++
> implementation):
> >
> > Link<
> https://github.com/apache/arrow/blob/c70426f73326b3852d1bd7c31d98be4743f3fcba/cpp/src/arrow/array/array_nested.cc#L111-L113
> >
> >
> > Looking forward to your opintions!
> >
> >
> > Regards,
> > Remzi
>

Reply via email to