For what it is worth arrow-rs takes the most permission interpretation 3 - we only reject unambiguously malformed StructArray. For further context I believe the instigator of this email thread is [1].

I think the main question with taking one of the more strict interpretations is what value is assigned to "masked" values when parsing from some other format, such as JSON or parquet, that doesn't encode them. Some people think it should be NULL, others arbitrary. For example, when arrow-rs changed the parquet reader from using NULL to arbitrary it was actually reported as a bug [2].

My 2 cents is that this is a bit like the question around whether StructArray can have fields with the same name. If something had been standardised at the start that would be one thing, but retroactively adding schema restrictions now is likely to break existing workflows, and is therefore probably best avoided.

Kind Regards,

Raphael

[1]: https://github.com/apache/arrow-rs/issues/9302
[2]: https://github.com/apache/arrow-rs/issues/7119

On 29/01/2026 19:10, Raz Luvaton wrote:
Currently there is ambiguity on what the validity buffer for non nullable
field of a nullable struct can be.

Lets take for example the following type:
```
nullable StructArray with non nullable field Int32
```
The struct validity is: valid, null, null, valid.

which of the following should be:
1. The child array (the int32 array) FORBIDDEN from having nulls at all
(i.e. in our example the validity buffer for the child must be valid,
valid, valid, valid) as the field is marked as non nullable?
2. The child array REQUIRED to have nulls at the same positions of the
struct nulls, i.e. the validity buffer for the child MUST be valid, null,
null, valid in our example?
3. The child array MAY have nulls but it is FORBIDDEN to have nulls where
the struct does not have nulls, i.e. it can't have null, null, valid, valid
but it can have valid, null, valid, valid in our example.

I would argue that 1 is the correct and expected requirement, as the field
is marked as non nullable.

The chosen behavior will be applicable for other nested types as well


Thanks, Raz Luvaton

Reply via email to