Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

Wes McKinney Fri, 28 Sep 2018 14:41:35 -0700

hi Abdul -- Plasma vs. a memory map on /dev/shm should have the same
semantics re: memory copying, so I don't believe using Plasma will
change the outcome


- Wes
On Fri, Sep 28, 2018 at 5:38 PM Abdul Rahman <[email protected]> wrote:
>
> Have you tried using plasma which is effectively what you are trying to do ?
>
> https://arrow.apache.org/docs/python/plasma.html#using-arrow-and-pandas-with-plasma
>
>
> ________________________________
> From: Bipin Mathew <[email protected]>
> Sent: Friday, September 28, 2018 2:28:54 PM
> To: [email protected]
> Subject: Help with zero-copy conversion of pyarrow table to pandas dataframe.
>
> Hello Everyone,
>
>      I am just getting my feet wet with apache arrow and I am running into
> a bug or, more likely, simply misunderstanding the pyarrow api. I wrote out
> a four column, million row apache arrow table to shared memory and I am
> attempting to read it into a python dataframe. It is advertised that it is
> possible to do this in a zero-copy manner, however, when I run the
> to_pandas() method on the table I imported into pyarrow, my memory usage
> increases, indicating that it did not actually do a zero-copy conversion.
> Here is my code:
>
>   1 import pyarrow as pa
> >   2 import pandas as pd
> >   3 import numpy as np
> >   4 import time
> >   5
> >   6 start = time.time()
> >   7 mm=pa.memory_map('/dev/shm/arrow_table')
> >   8 b=mm.read_buffer()
> >   9 reader = pa.RecordBatchStreamReader(b)
> >  10 z = reader.read_all()
> >  11 print("reading time: "+str(time.time()-start))
> >  12
> >  13 start = time.time()
> >  14 df = z.to_pandas(zero_copy_only=True,use_threads=True)
> >  15 print("conversion time: "+str(time.time()-start))
>
>
> What am I doing wrong here? Or indeed am I simply misunderstanding what is
> meant by zero-copy in this context? My frantic google efforts only resulted
> in this possibly relevant issue, but it was unclear to me how it was
> resolved:
>
> https://github.com/apache/arrow/issues/1649
>
> I am using pyarrow 0.10.0.
>
> Regards,
>
> Bipin

Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

Reply via email to