Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Weston Pace
This works very well and is much simpler. Thank you for the workaround. On Wed, Dec 11, 2019 at 10:29 AM Antoine Pitrou wrote: > > As a workaround, you can use the following hack: > > >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")]) > > > >>> arr > > > > 123 nulls > >>> arr

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Antoine Pitrou
As a workaround, you can use the following hack: >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")]) >>> arr 123 nulls >>> arr.cast(pa.int32()) [ null, null, null, null, null, null, null, null, null, null, ... null, null, null, null, null,

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Weston Pace
Thanks. Ted, I tried using numpy similar to your approach and had the same performance. For the time being I am using a dictionary of data-type to pre-allocated big empty array which should work for me in the meantime. On Wed, Dec 11, 2019 at 9:20 AM Antoine Pitrou wrote: > > There's a C++ fac

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Antoine Pitrou
There's a C++ facility to do this, but it's not exposed in Python yet. I opened ARROW-7375 for it. Regards Antoine. Le 11/12/2019 à 19:36, Weston Pace a écrit : > I'm trying to combine multiple parquet files. They were produced at > different points in time and have different columns. For e

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Ted Gooch
Not sure if this is any better, but I have an open PR right now in Iceberg, where we are doing something similar: https://github.com/apache/incubator-iceberg/pull/544/commits/28166fd3f0e3a24863048a2721f1ae69f243e2af#diff-51d6edf951c105e1e62a3f1e8b4640aaR319-R341 @staticmethod def create_null_colum