Thank you both for your time and for your answers. I've made a mistake; you are right :-)
Best, Jorge On Sun, Feb 21, 2021 at 1:39 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Jorge, > I'm not sure I understand your example, but I would expect any child array > of a fixed size list to always have N*size of fixed size list elements. So > for: > [ > null, > [a, bc], > [de, feg] > ] > > (i.e. FixedSizeList<Binary>(2) where length 3. with the first element is > null) > > I would expect the child array to have [0, 0, 0, 1, 3, 5, 8] as its > indices (a total logical length=6). > > Which I think corresponds to your second representation of the child > array? C++ Validates FixedSizeLists in its validate method to meet this > conditions [1] > > We should probably clarify the specification. > > -Micah > > [1] > > https://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/array/validate.cc#L103 > > > > > On Sun, Feb 21, 2021 at 12:38 AM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > We state in the spec that: > > > > A fixed size list type is specified like FixedSizeList<T>[N], where T is > > > any type (*primitive or nested*) and N is a 32-bit signed integer > > > representing the length of the lists. > > > > > > > (emphasis mine) > > > > Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type > whose > > inner is a variable sized type, as follows > > > > [ > > Null, > > [ > > [[0], [1, 2]], > > [[3, 4], [5]], > > ] > > ] > > > > Looking at the offsets of the binary, two options seem possible according > > to the spec: > > > > 1. [0, 1, 3, 5, 6] (i.e. inner has len = 4) > > 2. [0, 0, 0, 1, 3, 5, 6] (i.e. inner has len = 6) > > > > The difference in behavior emerges whenever we want to access the values > of > > the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ] > > above. > > > > With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: > for i > > = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the > > result would still be wrong if this was in bounds, as it excluded the > > `[[0], [1, 2]]`). In this case, we need to count the number of nulls, > > `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`. > > > > If we use option 2, we can slice the binary directly using `[i * 2, (i + > 1) > > * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`, > > which is correct. > > > > The challenge here is that there is no way to tell whether the inner > array > > fulfills this "sliceability" constraint or not. I can't find this > > constraint in the spec. Do we enforce it somewhere? Note that this > behavior > > only affects FixedSizeList, but it does affect all variations whose inner > > has a variable size (List, Binary, Utf8, etc). > > > > Any ideas? > > > > Best, > > Jorge > > >