Re: [Discuss] Single offset per array has a non-trivial performance implication

Antoine Pitrou Wed, 27 Oct 2021 10:57:04 -0700


Le 26/10/2021 à 21:30, Jorge Cardoso Leitão a écrit :

Hi,

One aspect of the design of "arrow2" is that it deals with array slices
differently from the rest of the implementations. Essentially, the offset
is not stored in ArrayData, but on each individual Buffer. Some important
consequence are:

* people can work with buffers and bitmaps without having to drag the
corresponding array offset with them (which are common source of
unsoundness in the official Rust implementation)
* arrays can store buffers/bitmaps with independent offsets
* it does not roundtrip over the c data interface at zero cost, because the
c data interface only allows a single offset per array, not per
buffer/bitmap.

To be clear, this only comes into play for bit buffers (such as thevalidity bitmap), right? Otherwise, the offset can just be incorporatedinto the buffer's base pointer.

> I have been benchmarking the consequences of this design choice andreached

> the conclusion that storing the offset on a per buffer basis offers at
> least 15% improvement in compute (results vary on kernel and likely
> implementation).

This seems to assume that many or most arrays will have non-zerooffsets. Is this something that commonly happens in the Rust Arrowworld? In Arrow C++ I'm not sure non-zero offsets appear very frequently.


Regards

Antoine.

Re: [Discuss] Single offset per array has a non-trivial performance implication

Reply via email to