Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You're welcome! On Wed, May 16, 2018 at 6:13 PM Corey Nolet wrote: > I must say, I’m super excited about using Arrow and Plasma. > > The code you just posted worked for me at home and I’m sure I’ll figure > out what I was doing wrong tomorrow at work. > > Anyways, thanks so much for your help an

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
I must say, I’m super excited about using Arrow and Plasma. The code you just posted worked for me at home and I’m sure I’ll figure out what I was doing wrong tomorrow at work. Anyways, thanks so much for your help and fast replies! Sent from my iPhone > On May 16, 2018, at 7:42 PM, Robert N

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You should be able to do something like the following. # Start the store. plasma_store -s /tmp/store -m 10 Then in Python, do the following: import pandas as pd import pyarrow.plasma as plasma import numpy as np client = plasma.connect('/tmp/store', '', 0) series = pd.Series(np.zeros(10

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
Robert, Thank you for the quick response. I've been playing around for a few hours to get a feel for how this works. If I understand correctly, it's better to have the Plasma client objects instantiated within each separate process? Weird things seemed to happen when I attempted to share a single

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
Take a look at the Plasma object store https://arrow.apache.org/docs/python/plasma.html. Here's an example using it (along with multiprocessing to sort a pandas dataframe) https://github.com/apache/arrow/blob/master/python/examples/plasma/sorting/sort_df.py. It's possible the example is a bit out