> On Mar 4, 2022, at 9:08 AM, Antoine Pitrou <anto...@python.org> wrote:
> 
> 
> I opened https://issues.apache.org/jira/browse/ARROW-15846 
> Regards
> 
> Antoine.
> 
> 
> Le 04/03/2022 à 15:05, Antoine Pitrou a écrit :
>> Le 04/03/2022 à 15:01, Hanqi Wu a écrit :
>>> Hi Antoine,
>>> 
>>> I agree n_buffers should still be set to 1. But as per the below PyArrow 
>>> doc, n_buffers’s value will be 0 if no null values in a struct array. This 
>>> is what confuses me.
>>> 
>>> "A struct array does not have any additional allocated physical storage for 
>>> its values. A struct array must still have an allocated validity bitmap, if 
>>> it has one or more null values.”
>> Ok, the wording is clumsy, but note "*allocated* validity bitmap" :-) In
>> other words, if the null count is 0, the validity bitmap need not be
>> allocated, but it's still "present" in the metadata (for example as a
>> null pointer, if using the C data interface).
>> This probably deserves clarifying, though. I'll open an issue.
>> Regards
>> Antoine.
>>> 
>>> https://arrow.apache.org/docs/format/Columnar.html#struct-layout 
>>> Thanks,
>>> Hanqi
>>> 
>>> On Mar 4, 2022, at 8:57 AM, Antoine Pitrou 
>>> <anto...@python.org<mailto:anto...@python.org>> wrote:
>>> 
>>> 
>>> Hi Hanqi,
>>> 
>>> Le 04/03/2022 à 14:53, Hanqi Wu a écrit :
>>> Hi Antoine,
>>> I agree. But my question is for Arrow StructArray with No null values. In 
>>> this case, as per the documentation, n_buffers should be set to 0.
>>> 
>>> Well, no.  As I said, it should still be 1.
>>> 
>>> You can also take a look at the fields produced when exporting such an 
>>> array.
>>> 
>>> Regards
>>> 
>>> Antoine.
>>> 
>>> 
>>> 
>>> 
>>> However, “import_from_c” expects StructArray to always have at least 1 
>>> buffer allocated, otherwise it throws an exception.
>>> Best,
>>> Hanqi
>>> On Mar 4, 2022, at 8:47 AM, Antoine Pitrou 
>>> <anto...@python.org<mailto:anto...@python.org>> wrote:
>>> 
>>> 
>>> Le 04/03/2022 à 04:17, Hanqi Wu a écrit :
>>> Hello community,
>>> As per the below documentation, for an Arrow StructArray, it won’t have any 
>>> physical buffers backing it if it doesn’t contain any null value:
>>> https://arrow.apache.org/docs/format/Columnar.html#struct-layout   However, 
>>> in PyArrow, it complains if you try to import from C an ArrowArray 
>>> representing Struct type without a null vector (no nulls), which, according 
>>> to the Arrow spec above, is permitted.
>>> To be more detailed, when doing import from C, it expects the number of 
>>> buffers to be 1, as coded here:
>>> https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1332
>>> Which seems to suggest it will always expect the validity bitmap.
>>> 
>>> Not really.  It expects one entry in the `buffers` array
>>> (`n_buffers == 1`), but the entry can be NULL:
>>> 
>>> """The pointer to the null bitmap buffer, if the data type specifies one, 
>>> MAY be NULL only if ArrowArray.null_count is 0."""
>>> 
>>> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers
>>>   You can only see the corresponding logic in the import code here:
>>> https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1423-L1431
>>> 
>>> Regards
>>> 
>>> Antoine.
>>> 

Reply via email to