Le 26/10/2021 à 21:30, Jorge Cardoso Leitão a écrit :
Hi,
One aspect of the design of "arrow2" is that it deals with array slices
differently from the rest of the implementations. Essentially, the offset
is not stored in ArrayData, but on each individual Buffer. Some important
consequence are:
* people can work with buffers and bitmaps without having to drag the
corresponding array offset with them (which are common source of
unsoundness in the official Rust implementation)
* arrays can store buffers/bitmaps with independent offsets
* it does not roundtrip over the c data interface at zero cost, because the
c data interface only allows a single offset per array, not per
buffer/bitmap.
To be clear, this only comes into play for bit buffers (such as the
validity bitmap), right? Otherwise, the offset can just be incorporated
into the buffer's base pointer.
> I have been benchmarking the consequences of this design choice and
reached
> the conclusion that storing the offset on a per buffer basis offers at
> least 15% improvement in compute (results vary on kernel and likely
> implementation).
This seems to assume that many or most arrays will have non-zero
offsets. Is this something that commonly happens in the Rust Arrow
world? In Arrow C++ I'm not sure non-zero offsets appear very frequently.
Regards
Antoine.