This works very well and is much simpler. Thank you for the workaround.
On Wed, Dec 11, 2019 at 10:29 AM Antoine Pitrou wrote:
>
> As a workaround, you can use the following hack:
>
> >>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")])
>
>
> >>> arr
>
>
>
> 123 nulls
> >>> arr
As a workaround, you can use the following hack:
>>> arr = pa.Array.from_buffers(pa.null(), 123, [pa.py_buffer(b"")])
>>> arr
123 nulls
>>> arr.cast(pa.int32())
[
null,
null,
null,
null,
null,
null,
null,
null,
null,
null,
...
null,
null,
null,
null,
null,
Thanks. Ted, I tried using numpy similar to your approach and had the same
performance. For the time being I am using a dictionary of data-type to
pre-allocated big empty array which should work for me in the meantime.
On Wed, Dec 11, 2019 at 9:20 AM Antoine Pitrou wrote:
>
> There's a C++ fac
There's a C++ facility to do this, but it's not exposed in Python yet.
I opened ARROW-7375 for it.
Regards
Antoine.
Le 11/12/2019 à 19:36, Weston Pace a écrit :
> I'm trying to combine multiple parquet files. They were produced at
> different points in time and have different columns. For e
Not sure if this is any better, but I have an open PR right now in Iceberg,
where we are doing something similar:
https://github.com/apache/incubator-iceberg/pull/544/commits/28166fd3f0e3a24863048a2721f1ae69f243e2af#diff-51d6edf951c105e1e62a3f1e8b4640aaR319-R341
@staticmethod
def create_null_colum