Re: Question about memory usage and type casting using pyarrow Table

2023-02-16 Thread Weston Pace
> (1) if I want to cast n columns to a different type (e.g., float to int). What is the smallest memory overhead that I can do? (memory overhead of 1 column, n columns or 100 columns?) You should be able to do this with only 1 column of overhead. Though you might need to go a little out of your w

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Aldrin
I think you can replace the schema metadata using [1]. You can perhaps also do the same for the field metadata, depending on where timezone metadata may be on a timestamp array [2]. [1]: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.replace_schema_metadata [2]: ht

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Oh thanks that could be a workaround! I thought pa tables are supposed to be immutable , is there a safe way to just change the metadata? On Wed, Feb 15, 2023 at 5:44 PM Rok Mihevc wrote: > Well that's suboptimal. As a workaround I suppose you could just change the > metadata if the array is tim

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Rok Mihevc
Well that's suboptimal. As a workaround I suppose you could just change the metadata if the array is timezone aware. On Wed, Feb 15, 2023 at 10:37 PM Li Jin wrote: > Oh found this comment: > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Oh found this comment: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156 On Wed, Feb 15, 2023 at 4:23 PM Li Jin wrote: > Not sure if this is actually a bug or expected behavior - I filed > https://github.com/apache/arrow/issues/34210 > > On

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Not sure if this is actually a bug or expected behavior - I filed https://github.com/apache/arrow/issues/34210 On Wed, Feb 15, 2023 at 4:15 PM Li Jin wrote: > Hmm..something feels off here - I did the following experiment on Arrow 11 > and casting timestamp-naive to int64 is much faster than cas

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Hmm..something feels off here - I did the following experiment on Arrow 11 and casting timestamp-naive to int64 is much faster than casting timestamp-naive to timestamp-utc: In [16]: %time table.cast(schema_int) CPU times: user 114 µs, sys: 30 µs, total: 144 µs *Wall time: 231 µs* Out[16]: pyarrow

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Rok Mihevc
I'm not sure about (1) but I'm pretty sure for (2) doing a cast of tz-aware timestamp to tz-naive should be a metadata-only change. On Wed, Feb 15, 2023 at 4:19 PM Li Jin wrote: > Asking (2) because IIUC this is a metadata operation that could be zero > copy but I am not sure if this is actually

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Asking (2) because IIUC this is a metadata operation that could be zero copy but I am not sure if this is actually the case. On Wed, Feb 15, 2023 at 10:17 AM Li Jin wrote: > Hello! > > I have some questions about type casting memory usage with pyarrow Table. > Let's say I have a pyarrow Table wi

Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Li Jin
Hello! I have some questions about type casting memory usage with pyarrow Table. Let's say I have a pyarrow Table with 100 columns. (1) if I want to cast n columns to a different type (e.g., float to int). What is the smallest memory overhead that I can do? (memory overhead of 1 column, n columns