Original discussion at
https://github.com/apache/arrow/pull/9471#issuecomment-779944257 (PR for
https://issues.apache.org/jira/browse/ARROW-11595 )

Although the format does not specify what is contained in array slots
masked by null bits (for example the first byte in the data buffer of an
int8 array whose first slot is null), there are other considerations which
might motivate establishing conventions for some arrays created by the C++
implementation:
- No spurious complaints from valgrind when running otherwise safe
element-wise compute kernels on values under null bits. In the case of
ARROW-11595, the values buffer of the result of casting from Type::NA to
Type::INT8 is left uninitialized but masked by an entirely-null validity
bitmap. When such an array is passed to a comparison kernel, a branch on
the uninitialized values triggered valgrind even though the results of that
branch were also masked by an empty validity bitmap.
- If the underlying values were allocated but not initialized they may leak
private information such as private keys, passwords, or tokens which were
placed in that memory then freed by an application without overwrite
- Improved compression of data buffers (for example in writing to the IPC
format), since a run of nulls would correspond to consistent, repeated
values in all buffers
- Deterministic output from operations which are unable to honor null
bitmaps, such as computing the checksum of an IPC file

Reply via email to