> > Is it to ensure O(1) random access (instead of having to sum all > deltas up to the index)?
This is my understanding of why it was chosen. On Thu, Jun 17, 2021 at 10:32 PM Jorge Cardoso Leitão < [email protected]> wrote: > Hi, > > (this has no direction; I am just genuinely curious) > > I am wondering, what is the rational to use "offsets" instead of > "lengths" to represent variable sized arrays? > > I.e. ["a", "", None, "ab"] is represented as > > offsets: [0, 1, 1, 1, 3] > values: "aab" > > what is the reasoning to use this over > > lengths: [1, 0, 0, 2] > values: "aab" > > I am asking this because I have seen people using the LargeUtf8 type, > or breaking Record batches in chunks, to avoid hitting the ceiling of > i32 of large arrays with strings. > > Is it to ensure O(1) random access (instead of having to sum all > deltas up to the index)? > > Best, > Jorge >
