Le 04/03/2022 à 04:17, Hanqi Wu a écrit :
Hello community,
As per the below documentation, for an Arrow StructArray, it won’t have any
physical buffers backing it if it doesn’t contain any null value:
https://arrow.apache.org/docs/format/Columnar.html#struct-layout
However, in PyArrow, it complains if you try to import from C an ArrowArray
representing Struct type without a null vector (no nulls), which, according to
the Arrow spec above, is permitted.
To be more detailed, when doing import from C, it expects the number of buffers
to be 1, as coded here:
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1332
Which seems to suggest it will always expect the validity bitmap.
Not really. It expects one entry in the `buffers` array
(`n_buffers == 1`), but the entry can be NULL:
"""The pointer to the null bitmap buffer, if the data type specifies
one, MAY be NULL only if ArrowArray.null_count is 0."""
https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers
You can only see the corresponding logic in the import code here:
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1423-L1431
Regards
Antoine.