Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

Wes McKinney Fri, 28 Sep 2018 14:39:01 -0700

hi Bipin,

There are narrow circumstances where zero-copy pandas deserialization
is possible. Firstly, I noted that we are short of documentation for
Table.to_pandas, so I opened


https://issues.apache.org/jira/browse/ARROW-3356

It's possible there's a bug when zero_copy_only=True -- it is supposed
to raise if any memory allocations are required. Can you give more
information what you mean by "my memory usage
increases". Did it increase by the footprint of the underlying memory?
A minimal reproducible example would help us investigate further.

Thanks,
Wes
On Fri, Sep 28, 2018 at 5:29 PM Bipin Mathew <bipinmat...@gmail.com> wrote:
>
> Hello Everyone,
>
>      I am just getting my feet wet with apache arrow and I am running into
> a bug or, more likely, simply misunderstanding the pyarrow api. I wrote out
> a four column, million row apache arrow table to shared memory and I am
> attempting to read it into a python dataframe. It is advertised that it is
> possible to do this in a zero-copy manner, however, when I run the
> to_pandas() method on the table I imported into pyarrow, my memory usage
> increases, indicating that it did not actually do a zero-copy conversion.
> Here is my code:
>
>   1 import pyarrow as pa
> >   2 import pandas as pd
> >   3 import numpy as np
> >   4 import time
> >   5
> >   6 start = time.time()
> >   7 mm=pa.memory_map('/dev/shm/arrow_table')
> >   8 b=mm.read_buffer()
> >   9 reader = pa.RecordBatchStreamReader(b)
> >  10 z = reader.read_all()
> >  11 print("reading time: "+str(time.time()-start))
> >  12
> >  13 start = time.time()
> >  14 df = z.to_pandas(zero_copy_only=True,use_threads=True)
> >  15 print("conversion time: "+str(time.time()-start))
>
>
> What am I doing wrong here? Or indeed am I simply misunderstanding what is
> meant by zero-copy in this context? My frantic google efforts only resulted
> in this possibly relevant issue, but it was unclear to me how it was
> resolved:
>
> https://github.com/apache/arrow/issues/1649
>
> I am using pyarrow 0.10.0.
>
> Regards,
>
> Bipin

Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

Reply via email to