Hi,

We state in the spec that:

A fixed size list type is specified like FixedSizeList<T>[N], where T is
> any type (*primitive or nested*) and N is a 32-bit signed integer
> representing the length of the lists.
>

(emphasis mine)

Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type whose
inner is a variable sized type, as follows

[
    Null,
    [
        [[0], [1, 2]],
        [[3, 4], [5]],
    ]
]

Looking at the offsets of the binary, two options seem possible according
to the spec:

1. [0, 1, 3, 5, 6]  (i.e. inner has len = 4)
2. [0, 0, 0, 1, 3, 5, 6]  (i.e. inner has len = 6)

The difference in behavior emerges whenever we want to access the values of
the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ]
above.

With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: for i
= 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the
result would still be wrong if this was in bounds, as it excluded the
`[[0], [1, 2]]`). In this case, we need to count the number of nulls,
`nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`.

If we use option 2, we can slice the binary directly using `[i * 2, (i + 1)
* 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`,
which is correct.

The challenge here is that there is no way to tell whether the inner array
fulfills this "sliceability" constraint or not. I can't find this
constraint in the spec. Do we enforce it somewhere? Note that this behavior
only affects FixedSizeList, but it does affect all variations whose inner
has a variable size (List, Binary, Utf8, etc).

Any ideas?

Best,
Jorge

Reply via email to