It sounds like a bug if is_mutable_ is true but mutable_data_ is nullptr. Regards Antoine. Le 21/04/2021 à 03:17, Weston Pace a écrit :
If it comes from pandas (and is eligible for zero-copy) then the buffer implementation will be `NumPyBuffer`. Printing one in GDB yields... ``` $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true, data_ = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, capacity_ = 16, parent_ = {<std::__shared_ptr<arrow::Buffer, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<arrow::Buffer, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}, memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<arrow::MemoryManager, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi = 0x55b71fb90640}}, <No data fields>}} ``` Notice that `is_cpu_` and `is_mutable_` are both `true`. It's maybe a bug that `is_mutable_` is true. Although maybe not as it appears to be telling whether the underlying numpy buffer itself is mutable or not... ``` if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) { is_mutable_ = true; } ``` On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera <[email protected]> wrote:Hi all, We have been using Arrow v2.0.0 and we encountered the following issue. I was reading a table with numeric data using pandas.read_csv and then converting it into pyarrow table. In our application (Cylon <https://github.com/cylondata/cylon>), we are accessing this pyarrow table from c++. We want to access the mutable data of the arrays in the pyarrow table. But the following returns a nullptr. T *mutable_data = array->data()->GetMutableValues<T>(1); // returns nullptr Interestingly, array->data()->buffers[1]->IsMutable(); // returns true array->data()->buffers[1]->IsCpu(); // returns true This only happens when I use pandas df to create a pyarrow table. It wouldn't happen when I use pyarrow.read_csv. So, I am guessing there's some issue in the buffer creation from pandas df. Is this an expected behavior? or has this been resolved in v2.0< releases? Best -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>
