Hi Tobias,
Thank you for clarifying this, makes sense to me
Kind Regards,
Raphael
On 22/01/2023 16:15, Tobias Zagorni wrote:
Hi Raphael,
I think this is indeed a documentation mistake, it should say 0!
For exeactly these reasons you mentioned I determined that it is best
to leave the null count field always 0 for RLE arrays. This way it is
consistent with union types, at least.
RunLengthEncoded data should not contain a null mask by itself. The
idea so far is that Null is just one of the possible values for a run.
(if we were to allow the RLE array parent to have an additional null
mask, the null count field would represent that - there seems to be a
generall assumption in Arrow code that a non-zero (or array length for
the NULL) null count means the presence of the standard null mask)
Best,
Tobias
On 2023/01/22 15:12:32 Raphael Taylor-Davies wrote:
Hi,
Apologies if I am rehashing something that has already been discussed
or
is documented elsewhere, but reading the documentation of the Run-
Length
encoding [1] I noticed that the parent null count can be non-zero
[2].
This is somewhat surprising to me for a couple of reasons:
- This is inconsistent with how it is handled for other nested types
like dictionaries, structs, etc... where a null count is solely the
number of nulls in the mask of that Array
- Codepaths that use null counts to infer validity mask properties
such
as presence, bit counts, etc... will no longer work
- This null count can only be recomputed in the context of the run-
ends,
implying codepaths that slice ArrayData or otherwise manipulate
ArrayData directly must be run-length aware
This leads to a couple of questions
- Is this a documentation mistake or is the null count of
RunEndEncoded
ArrayData determined by its children
- Can a RunEndEncoded ArrayData contain a null mask itself,
independently of its runs, much like dictionary arrays can
Any clarifications would be most welcome
[1]:
https://arrow.apache.org/docs/dev/format/Columnar.html#run-end-encoded-layout
[2]: https://github.com/apache/arrow/pull/13333/files#r1083470362