This works very well and is much simpler. Thank you for the workaround.
On Wed, Dec 11, 2019 at 10:29 AM Antoine Pitrou wrote:
>
> As a workaround, you can use the following hack:
>
> >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")])
>
>
> >>> arr
>
>
>
> 123 nulls
> >>> arr
As a workaround, you can use the following hack:
>>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")])
>>> arr
123 nulls
>>> arr.cast(pa.int32())
[
null,
null,
null,
null,
null,
null,
null,
null,
null,
null,
...
null,
null,
null,
null,
null,
Thanks. Ted, I tried using numpy similar to your approach and had the same
performance. For the time being I am using a dictionary of data-type to
pre-allocated big empty array which should work for me in the meantime.
On Wed, Dec 11, 2019 at 9:20 AM Antoine Pitrou wrote:
>
> There's a C++ fac
There's a C++ facility to do this, but it's not exposed in Python yet.
I opened ARROW-7375 for it.
Regards
Antoine.
Le 11/12/2019 à 19:36, Weston Pace a écrit :
> I'm trying to combine multiple parquet files. They were produced at
> different points in time and have different columns. For e
Not sure if this is any better, but I have an open PR right now in Iceberg,
where we are doing something similar:
https://github.com/apache/incubator-iceberg/pull/544/commits/28166fd3f0e3a24863048a2721f1ae69f243e2af#diff-51d6edf951c105e1e62a3f1e8b4640aaR319-R341
@staticmethod
def create_null_colum
I'm trying to combine multiple parquet files. They were produced at
different points in time and have different columns. For example, one has
columns A, B, C. Two has columns B, C, D. Three has columns C, D, E. I
want to concatenate all three into one table with columns A, B, C, D, E.
To do t