Re: Pyarrow RecordBatchStreamWriter and dictionaries

2021-04-24 Thread Wes McKinney
hi Radu — sounds potentially buggy, if you can create a Jira with a repro that would be very helpful On Thu, Apr 22, 2021 at 11:36 PM Radu Teodorescu wrote: > > Hi I am seeing a similar problem when serializing tables with lists of > dictionary encoded elements: each resulting chunk is pointing

Re: nullptr for mutable data in pyarrow table from pandas

2021-04-24 Thread Wes McKinney
I just opened https://issues.apache.org/jira/browse/ARROW-12530. This probably is an easy PR (we should add a unit test for the NumPyBuffer bug as part of it), but will have implications for 3rd party libraries that implement subclasses of Buffer because we're changing the base class members. On S

Re: nullptr for mutable data in pyarrow table from pandas

2021-04-24 Thread Niranda Perera
+1 for former. On Sat, Apr 24, 2021 at 9:30 AM Wes McKinney wrote: > Yes, I think the former would be acceptable. I don't think that anyone > should be putting Buffer::mutable_data() on the inner loop of a hot > path. > > On Sat, Apr 24, 2021 at 8:09 AM Antoine Pitrou wrote: > > > > > > It depe

Re: nullptr for mutable data in pyarrow table from pandas

2021-04-24 Thread Wes McKinney
Yes, I think the former would be acceptable. I don't think that anyone should be putting Buffer::mutable_data() on the inner loop of a hot path. On Sat, Apr 24, 2021 at 8:09 AM Antoine Pitrou wrote: > > > It depends what that entails exactly? Would we introduce a conditional > in release mode: >

Re: nullptr for mutable data in pyarrow table from pandas

2021-04-24 Thread Antoine Pitrou
It depends what that entails exactly? Would we introduce a conditional in release mode: uint8_t* mutable_data() { return is_mutable() ? const_cast(data()) : nullptr; } or would be always blindly return the data pointer, which seems extremely dangerous to me: uint8_t* mutable_dat

Re: nullptr for mutable data in pyarrow table from pandas

2021-04-24 Thread Wes McKinney
hi folks, Thoughts about this? Since we already assert that is_mutable_ is true in debug builds when accessing mutable_data_, using a const cast here seems relatively benign, and then we can drop 8 bytes from the Buffer struct On Wed, Apr 21, 2021 at 10:10 AM Wes McKinney wrote: > > I'd be open

[NIGHTLY] Arrow Build Report for Job nightly-2021-04-24-0

2021-04-24 Thread Crossbow
Arrow Build Report for Job nightly-2021-04-24-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-04-24-0 Failed Tasks: - conda-linux-gcc-py36-arm64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-04-24-0-drone-conda-linux-g