I am definitely in the camp that we should not leak past data through
uninitialized Arrow memory (for example by transmitting such buffers
using Arrow IPC).

Regards

Antoine.


Le 20/02/2021 à 21:17, Benjamin Kietzman a écrit :
> Original discussion at
> https://github.com/apache/arrow/pull/9471#issuecomment-779944257 (PR for
> https://issues.apache.org/jira/browse/ARROW-11595 )
> 
> Although the format does not specify what is contained in array slots
> masked by null bits (for example the first byte in the data buffer of an
> int8 array whose first slot is null), there are other considerations which
> might motivate establishing conventions for some arrays created by the C++
> implementation:
> - No spurious complaints from valgrind when running otherwise safe
> element-wise compute kernels on values under null bits. In the case of
> ARROW-11595, the values buffer of the result of casting from Type::NA to
> Type::INT8 is left uninitialized but masked by an entirely-null validity
> bitmap. When such an array is passed to a comparison kernel, a branch on
> the uninitialized values triggered valgrind even though the results of that
> branch were also masked by an empty validity bitmap.
> - If the underlying values were allocated but not initialized they may leak
> private information such as private keys, passwords, or tokens which were
> placed in that memory then freed by an application without overwrite
> - Improved compression of data buffers (for example in writing to the IPC
> format), since a run of nulls would correspond to consistent, repeated
> values in all buffers
> - Deterministic output from operations which are unable to honor null
> bitmaps, such as computing the checksum of an IPC file
> 

Reply via email to