Although - I am curious if there are any downsides using `self_destruct`?

On Thu, Aug 31, 2023 at 1:05 PM Li Jin <ice.xell...@gmail.com> wrote:

> Ah I see - thanks for the explanation. self_destruct probably won't
> benefit in my case then. (The pa.Array here is a slice from another batch
> so there will be other references to the data backing this array)
>
> On Thu, Aug 31, 2023 at 11:24 AM David Li <lidav...@apache.org> wrote:
>
>> Not sure about the conversion, but regarding self_destruct: the problem
>> is that it only provides memory savings in limited situations that are hard
>> to figure out from the outside. When enabled, PyArrow will always discard
>> the reference to the array after conversion, and if there are no other
>> references, that would free the array. But different arrays may be backed
>> by the same underlying memory buffer (this is generally true for IPC and
>> Flight, for example), so freeing the array won't actually free any memory
>> since the buffer is still alive. It would only save memory if you ensure
>> each array is actually backed by its own memory allocations (which right
>> would generally mean copying data up front!).
>>
>> On Thu, Aug 31, 2023, at 11:11, Li Jin wrote:
>> > Hi,
>> >
>> > I am working on some code where I have a list of pa.Arrays and I am
>> > creating a pandas.DataFrame from it. I also want to set the index of the
>> > pd.DataFrame to be the first Array in the list.
>> >
>> > Currently I am doing sth like:
>> > "
>> > df = pa.Table.from_arrays(arrs, names=input_names).to_pandas()
>> > df.set_index(input_names[0], inplace=True)
>> > "
>> >
>> > I am curious if this is the best I can do? Also I wonder if it is still
>> > worthwhile to use the "self_destruct=True" option here (I noticed it has
>> > been EXPERIMENTAL for a long time)
>> >
>> > Thanks!
>> > Li
>>
>

Reply via email to