I am definitely in the camp that we should not leak past data through uninitialized Arrow memory (for example by transmitting such buffers using Arrow IPC).
Regards Antoine. Le 20/02/2021 à 21:17, Benjamin Kietzman a écrit : > Original discussion at > https://github.com/apache/arrow/pull/9471#issuecomment-779944257 (PR for > https://issues.apache.org/jira/browse/ARROW-11595 ) > > Although the format does not specify what is contained in array slots > masked by null bits (for example the first byte in the data buffer of an > int8 array whose first slot is null), there are other considerations which > might motivate establishing conventions for some arrays created by the C++ > implementation: > - No spurious complaints from valgrind when running otherwise safe > element-wise compute kernels on values under null bits. In the case of > ARROW-11595, the values buffer of the result of casting from Type::NA to > Type::INT8 is left uninitialized but masked by an entirely-null validity > bitmap. When such an array is passed to a comparison kernel, a branch on > the uninitialized values triggered valgrind even though the results of that > branch were also masked by an empty validity bitmap. > - If the underlying values were allocated but not initialized they may leak > private information such as private keys, passwords, or tokens which were > placed in that memory then freed by an application without overwrite > - Improved compression of data buffers (for example in writing to the IPC > format), since a run of nulls would correspond to consistent, repeated > values in all buffers > - Deterministic output from operations which are unable to honor null > bitmaps, such as computing the checksum of an IPC file >