Hi Raphael,

I think this is indeed a documentation mistake, it should say 0!

For exeactly these reasons you mentioned I determined that it is best
to leave the null count field always 0 for RLE arrays. This way it is
consistent with union types, at least.

RunLengthEncoded data should not contain a null mask by itself. The
idea so far is that Null is just one of the possible values for a run. 

(if we were to allow the RLE array parent to have an additional null
mask, the null count field would represent that - there seems to be a
generall assumption in Arrow code that a non-zero (or array length for
the NULL) null count means the presence of the standard null mask) 

Best,
Tobias 

On 2023/01/22 15:12:32 Raphael Taylor-Davies wrote:
> Hi,
> 
> Apologies if I am rehashing something that has already been discussed
or 
> is documented elsewhere, but reading the documentation of the Run-
Length 
> encoding [1] I noticed that the parent null count can be non-zero
[2].
> 
> This is somewhat surprising to me for a couple of reasons:
> 
> - This is inconsistent with how it is handled for other nested types 
> like dictionaries, structs, etc... where a null count is solely the 
> number of nulls in the mask of that Array
> - Codepaths that use null counts to infer validity mask properties
such 
> as presence, bit counts, etc... will no longer work
> - This null count can only be recomputed in the context of the run-
ends, 
> implying codepaths that slice ArrayData or otherwise manipulate 
> ArrayData directly must be run-length aware
> 
> This leads to a couple of questions
> 
> - Is this a documentation mistake or is the null count of
RunEndEncoded 
> ArrayData determined by its children
> - Can a RunEndEncoded ArrayData contain a null mask itself, 
> independently of its runs, much like dictionary arrays can
> 
> Any clarifications would be most welcome
> 
> [1]: 
>
https://arrow.apache.org/docs/dev/format/Columnar.html#run-end-encoded-layout
> [2]: https://github.com/apache/arrow/pull/13333/files#r1083470362
> 
> 

Reply via email to