Ah I see - thanks for the explanation. self_destruct probably won't benefit in my case then. (The pa.Array here is a slice from another batch so there will be other references to the data backing this array)
On Thu, Aug 31, 2023 at 11:24 AM David Li <lidav...@apache.org> wrote: > Not sure about the conversion, but regarding self_destruct: the problem is > that it only provides memory savings in limited situations that are hard to > figure out from the outside. When enabled, PyArrow will always discard the > reference to the array after conversion, and if there are no other > references, that would free the array. But different arrays may be backed > by the same underlying memory buffer (this is generally true for IPC and > Flight, for example), so freeing the array won't actually free any memory > since the buffer is still alive. It would only save memory if you ensure > each array is actually backed by its own memory allocations (which right > would generally mean copying data up front!). > > On Thu, Aug 31, 2023, at 11:11, Li Jin wrote: > > Hi, > > > > I am working on some code where I have a list of pa.Arrays and I am > > creating a pandas.DataFrame from it. I also want to set the index of the > > pd.DataFrame to be the first Array in the list. > > > > Currently I am doing sth like: > > " > > df = pa.Table.from_arrays(arrs, names=input_names).to_pandas() > > df.set_index(input_names[0], inplace=True) > > " > > > > I am curious if this is the best I can do? Also I wonder if it is still > > worthwhile to use the "self_destruct=True" option here (I noticed it has > > been EXPERIMENTAL for a long time) > > > > Thanks! > > Li >