>
> Is it to ensure O(1) random access (instead of having to sum all
> deltas up to the index)?


This is my understanding of why it was chosen.

On Thu, Jun 17, 2021 at 10:32 PM Jorge Cardoso Leitão <
[email protected]> wrote:

> Hi,
>
> (this has no direction; I am just genuinely curious)
>
> I am wondering, what is the rational to use "offsets" instead of
> "lengths" to represent variable sized arrays?
>
> I.e. ["a", "", None, "ab"] is represented as
>
> offsets: [0, 1, 1, 1, 3]
> values: "aab"
>
> what is the reasoning to use this over
>
> lengths: [1, 0, 0, 2]
> values: "aab"
>
> I am asking this because I have seen people using the LargeUtf8 type,
> or breaking Record batches in chunks, to avoid hitting the ceiling of
> i32 of large arrays with strings.
>
> Is it to ensure O(1) random access (instead of having to sum all
> deltas up to the index)?
>
> Best,
> Jorge
>

Reply via email to