Original discussion at https://github.com/apache/arrow/pull/9471#issuecomment-779944257 (PR for https://issues.apache.org/jira/browse/ARROW-11595 )
Although the format does not specify what is contained in array slots masked by null bits (for example the first byte in the data buffer of an int8 array whose first slot is null), there are other considerations which might motivate establishing conventions for some arrays created by the C++ implementation: - No spurious complaints from valgrind when running otherwise safe element-wise compute kernels on values under null bits. In the case of ARROW-11595, the values buffer of the result of casting from Type::NA to Type::INT8 is left uninitialized but masked by an entirely-null validity bitmap. When such an array is passed to a comparison kernel, a branch on the uninitialized values triggered valgrind even though the results of that branch were also masked by an empty validity bitmap. - If the underlying values were allocated but not initialized they may leak private information such as private keys, passwords, or tokens which were placed in that memory then freed by an application without overwrite - Improved compression of data buffers (for example in writing to the IPC format), since a run of nulls would correspond to consistent, repeated values in all buffers - Deterministic output from operations which are unable to honor null bitmaps, such as computing the checksum of an IPC file