> On Mar 4, 2022, at 9:08 AM, Antoine Pitrou <anto...@python.org> wrote:
>
>
> I opened https://issues.apache.org/jira/browse/ARROW-15846
> Regards
>
> Antoine.
>
>
> Le 04/03/2022 à 15:05, Antoine Pitrou a écrit :
>> Le 04/03/2022 à 15:01, Hanqi Wu a écrit :
>>> Hi Antoine,
>>>
>>> I agree n_buffers should still be set to 1. But as per the below PyArrow
>>> doc, n_buffers’s value will be 0 if no null values in a struct array. This
>>> is what confuses me.
>>>
>>> "A struct array does not have any additional allocated physical storage for
>>> its values. A struct array must still have an allocated validity bitmap, if
>>> it has one or more null values.”
>> Ok, the wording is clumsy, but note "*allocated* validity bitmap" :-) In
>> other words, if the null count is 0, the validity bitmap need not be
>> allocated, but it's still "present" in the metadata (for example as a
>> null pointer, if using the C data interface).
>> This probably deserves clarifying, though. I'll open an issue.
>> Regards
>> Antoine.
>>>
>>> https://arrow.apache.org/docs/format/Columnar.html#struct-layout
>>> Thanks,
>>> Hanqi
>>>
>>> On Mar 4, 2022, at 8:57 AM, Antoine Pitrou
>>> <anto...@python.org<mailto:anto...@python.org>> wrote:
>>>
>>>
>>> Hi Hanqi,
>>>
>>> Le 04/03/2022 à 14:53, Hanqi Wu a écrit :
>>> Hi Antoine,
>>> I agree. But my question is for Arrow StructArray with No null values. In
>>> this case, as per the documentation, n_buffers should be set to 0.
>>>
>>> Well, no. As I said, it should still be 1.
>>>
>>> You can also take a look at the fields produced when exporting such an
>>> array.
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>>
>>>
>>> However, “import_from_c” expects StructArray to always have at least 1
>>> buffer allocated, otherwise it throws an exception.
>>> Best,
>>> Hanqi
>>> On Mar 4, 2022, at 8:47 AM, Antoine Pitrou
>>> <anto...@python.org<mailto:anto...@python.org>> wrote:
>>>
>>>
>>> Le 04/03/2022 à 04:17, Hanqi Wu a écrit :
>>> Hello community,
>>> As per the below documentation, for an Arrow StructArray, it won’t have any
>>> physical buffers backing it if it doesn’t contain any null value:
>>> https://arrow.apache.org/docs/format/Columnar.html#struct-layout However,
>>> in PyArrow, it complains if you try to import from C an ArrowArray
>>> representing Struct type without a null vector (no nulls), which, according
>>> to the Arrow spec above, is permitted.
>>> To be more detailed, when doing import from C, it expects the number of
>>> buffers to be 1, as coded here:
>>> https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1332
>>> Which seems to suggest it will always expect the validity bitmap.
>>>
>>> Not really. It expects one entry in the `buffers` array
>>> (`n_buffers == 1`), but the entry can be NULL:
>>>
>>> """The pointer to the null bitmap buffer, if the data type specifies one,
>>> MAY be NULL only if ArrowArray.null_count is 0."""
>>>
>>> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers
>>> You can only see the corresponding logic in the import code here:
>>> https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1423-L1431
>>>
>>> Regards
>>>
>>> Antoine.
>>>