I opened https://issues.apache.org/jira/browse/ARROW-15846

Regards

Antoine.


Le 04/03/2022 à 15:05, Antoine Pitrou a écrit :

Le 04/03/2022 à 15:01, Hanqi Wu a écrit :
Hi Antoine,

I agree n_buffers should still be set to 1. But as per the below PyArrow doc, 
n_buffers’s value will be 0 if no null values in a struct array. This is what 
confuses me.

"A struct array does not have any additional allocated physical storage for its 
values. A struct array must still have an allocated validity bitmap, if it has one 
or more null values.”

Ok, the wording is clumsy, but note "*allocated* validity bitmap" :-) In
other words, if the null count is 0, the validity bitmap need not be
allocated, but it's still "present" in the metadata (for example as a
null pointer, if using the C data interface).

This probably deserves clarifying, though. I'll open an issue.

Regards

Antoine.



https://arrow.apache.org/docs/format/Columnar.html#struct-layout

Thanks,
Hanqi

On Mar 4, 2022, at 8:57 AM, Antoine Pitrou 
<anto...@python.org<mailto:anto...@python.org>> wrote:


Hi Hanqi,

Le 04/03/2022 à 14:53, Hanqi Wu a écrit :
Hi Antoine,
I agree. But my question is for Arrow StructArray with No null values. In this 
case, as per the documentation, n_buffers should be set to 0.

Well, no.  As I said, it should still be 1.

You can also take a look at the fields produced when exporting such an array.

Regards

Antoine.




However, “import_from_c” expects StructArray to always have at least 1 buffer 
allocated, otherwise it throws an exception.
Best,
Hanqi
On Mar 4, 2022, at 8:47 AM, Antoine Pitrou 
<anto...@python.org<mailto:anto...@python.org>> wrote:


Le 04/03/2022 à 04:17, Hanqi Wu a écrit :
Hello community,
As per the below documentation, for an Arrow StructArray, it won’t have any 
physical buffers backing it if it doesn’t contain any null value:
https://arrow.apache.org/docs/format/Columnar.html#struct-layout  However, in 
PyArrow, it complains if you try to import from C an ArrowArray representing 
Struct type without a null vector (no nulls), which, according to the Arrow 
spec above, is permitted.
To be more detailed, when doing import from C, it expects the number of buffers 
to be 1, as coded here:
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1332
Which seems to suggest it will always expect the validity bitmap.

Not really.  It expects one entry in the `buffers` array
(`n_buffers == 1`), but the entry can be NULL:

"""The pointer to the null bitmap buffer, if the data type specifies one, MAY be NULL only if 
ArrowArray.null_count is 0."""

https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers 
You can only see the corresponding logic in the import code here:
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/c/bridge.cc#L1423-L1431

Regards

Antoine.

Reply via email to