Although - I am curious if there are any downsides using `self_destruct`? On Thu, Aug 31, 2023 at 1:05 PM Li Jin <ice.xell...@gmail.com> wrote:
> Ah I see - thanks for the explanation. self_destruct probably won't > benefit in my case then. (The pa.Array here is a slice from another batch > so there will be other references to the data backing this array) > > On Thu, Aug 31, 2023 at 11:24 AM David Li <lidav...@apache.org> wrote: > >> Not sure about the conversion, but regarding self_destruct: the problem >> is that it only provides memory savings in limited situations that are hard >> to figure out from the outside. When enabled, PyArrow will always discard >> the reference to the array after conversion, and if there are no other >> references, that would free the array. But different arrays may be backed >> by the same underlying memory buffer (this is generally true for IPC and >> Flight, for example), so freeing the array won't actually free any memory >> since the buffer is still alive. It would only save memory if you ensure >> each array is actually backed by its own memory allocations (which right >> would generally mean copying data up front!). >> >> On Thu, Aug 31, 2023, at 11:11, Li Jin wrote: >> > Hi, >> > >> > I am working on some code where I have a list of pa.Arrays and I am >> > creating a pandas.DataFrame from it. I also want to set the index of the >> > pd.DataFrame to be the first Array in the list. >> > >> > Currently I am doing sth like: >> > " >> > df = pa.Table.from_arrays(arrs, names=input_names).to_pandas() >> > df.set_index(input_names[0], inplace=True) >> > " >> > >> > I am curious if this is the best I can do? Also I wonder if it is still >> > worthwhile to use the "self_destruct=True" option here (I noticed it has >> > been EXPERIMENTAL for a long time) >> > >> > Thanks! >> > Li >> >