Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Weston Pace
This works very well and is much simpler. Thank you for the workaround. On Wed, Dec 11, 2019 at 10:29 AM Antoine Pitrou wrote: > > As a workaround, you can use the following hack: > > >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")]) > > > >>> arr > > > > 123 nulls > >>> arr

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Antoine Pitrou
As a workaround, you can use the following hack: >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")]) >>> arr 123 nulls >>> arr.cast(pa.int32()) [ null, null, null, null, null, null, null, null, null, null, ... null, null, null, null, null,

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Weston Pace
Thanks. Ted, I tried using numpy similar to your approach and had the same performance. For the time being I am using a dictionary of data-type to pre-allocated big empty array which should work for me in the meantime. On Wed, Dec 11, 2019 at 9:20 AM Antoine Pitrou wrote: > > There's a C++ fac

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Antoine Pitrou
There's a C++ facility to do this, but it's not exposed in Python yet. I opened ARROW-7375 for it. Regards Antoine. Le 11/12/2019 à 19:36, Weston Pace a écrit : > I'm trying to combine multiple parquet files. They were produced at > different points in time and have different columns. For e

Re: Efficiently allocating an empty vector (python)

2019-12-11 Thread Ted Gooch
Not sure if this is any better, but I have an open PR right now in Iceberg, where we are doing something similar: https://github.com/apache/incubator-iceberg/pull/544/commits/28166fd3f0e3a24863048a2721f1ae69f243e2af#diff-51d6edf951c105e1e62a3f1e8b4640aaR319-R341 @staticmethod def create_null_colum

Efficiently allocating an empty vector (python)

2019-12-11 Thread Weston Pace
I'm trying to combine multiple parquet files. They were produced at different points in time and have different columns. For example, one has columns A, B, C. Two has columns B, C, D. Three has columns C, D, E. I want to concatenate all three into one table with columns A, B, C, D, E. To do t