Re: Optimized way of converting list of pa.Array to pd.DataFrame with index

2023-08-31 Thread Li Jin
Although - I am curious if there are any downsides using `self_destruct`? On Thu, Aug 31, 2023 at 1:05 PM Li Jin wrote: > Ah I see - thanks for the explanation. self_destruct probably won't > benefit in my case then. (The pa.Array here is a slice from another batch > so there will be other refer

Re: Optimized way of converting list of pa.Array to pd.DataFrame with index

2023-08-31 Thread Li Jin
Ah I see - thanks for the explanation. self_destruct probably won't benefit in my case then. (The pa.Array here is a slice from another batch so there will be other references to the data backing this array) On Thu, Aug 31, 2023 at 11:24 AM David Li wrote: > Not sure about the conversion, but re

Re: Optimized way of converting list of pa.Array to pd.DataFrame with index

2023-08-31 Thread David Li
Not sure about the conversion, but regarding self_destruct: the problem is that it only provides memory savings in limited situations that are hard to figure out from the outside. When enabled, PyArrow will always discard the reference to the array after conversion, and if there are no other ref

Optimized way of converting list of pa.Array to pd.DataFrame with index

2023-08-31 Thread Li Jin
Hi, I am working on some code where I have a list of pa.Arrays and I am creating a pandas.DataFrame from it. I also want to set the index of the pd.DataFrame to be the first Array in the list. Currently I am doing sth like: " df = pa.Table.from_arrays(arrs, names=input_names).to_pandas() df.set_i