This seems possibly similar to the issue reported in https://github.com/apache/arrow/issues/1946 -- we never found a resolution. Could we open a JIRA to track the problem?
On Fri, Jul 6, 2018 at 3:57 AM, Josh Quigley <josh.quig...@lifetrading.com.au> wrote: > That works. I've tried a bunch of debugging and work arounds- as far as I > can tell this is just a problem with deserializr from components and > multiprocess. > > On Fri., 6 Jul. 2018, 5:12 pm Robert Nishihara, <robertnishih...@gmail.com> > wrote: > >> Can you reproduce it without all of the multiprocessing code? E.g., just >> call *pyarrow.serialize* in one interpreter. Then copy and paste the bytes >> into another interpreter and call *pyarrow.deserialize *or >> *pyarrow.deserialize_components*? >> On Thu, Jul 5, 2018 at 9:48 PM Josh Quigley < >> josh.quig...@lifetrading.com.au> >> wrote: >> >> > Attachment inline: >> > >> > import pyarrow as pa >> > import multiprocessing as mp >> > import numpy as np >> > >> > def make_payload(): >> > """Common function - make data to send""" >> > return ['message', 123, np.random.uniform(-100, 100, (4, 4))] >> > >> > def send_payload(payload, connection): >> > """Common function - serialize & send data through a socket""" >> > s = pa.serialize(payload) >> > c = s.to_components() >> > >> > # Send >> > data = c.pop('data') >> > connection.send(c) >> > for d in data: >> > connection.send_bytes(d) >> > connection.send_bytes(b'') >> > >> > >> > def recv_payload(connection): >> > """Common function - recv data through a socket & deserialize""" >> > c = connection.recv() >> > c['data'] = [] >> > while True: >> > r = connection.recv_bytes() >> > if len(r) == 0: >> > break >> > c['data'].append(pa.py_buffer(r)) >> > >> > print('...deserialize') >> > return pa.deserialize_components(c) >> > >> > >> > def run_same_process(): >> > """Same process: Send data down a socket, then read data from the >> > matching socket""" >> > print('run_same_process') >> > recv_conn,send_conn = mp.Pipe(duplex=False) >> > payload = make_payload() >> > print(payload) >> > send_payload(payload, send_conn) >> > payload2 = recv_payload(recv_conn) >> > print(payload2) >> > >> > >> > def receiver(recv_conn): >> > """Separate process: runs in a different process, recv data & >> > deserialize""" >> > print('Receiver started') >> > payload = recv_payload(recv_conn) >> > print(payload) >> > >> > >> > def run_separate_process(): >> > """Separate process: launch the child process, then send data""" >> > >> > >> > print('run_separate_process') >> > recv_conn,send_conn = mp.Pipe(duplex=False) >> > process = mp.Process(target=receiver, args=(recv_conn,)) >> > process.start() >> > >> > payload = make_payload() >> > print(payload) >> > send_payload(payload, send_conn) >> > >> > process.join() >> > >> > if __name__ == '__main__': >> > run_same_process() >> > run_separate_process() >> > >> > >> > On Fri, Jul 6, 2018 at 2:42 PM Josh Quigley < >> > josh.quig...@lifetrading.com.au> >> > wrote: >> > >> > > A reproducible program attached - it first runs serialize/deserialize >> > from >> > > the same process, then it does the same work using a separate process >> for >> > > the deserialize. >> > > >> > > The behaviour see is (after the same process code executes happily) is >> > > hanging / child-process crashing during the call to deserialize. >> > > >> > > Is this expected, and if not, is there a known workaround? >> > > >> > > Running Windows 10, conda distribution, with package versions listed >> > > below. I'll also see what happens if I run on *nix. >> > > >> > > - arrow-cpp=0.9.0=py36_vc14_7 >> > > - boost-cpp=1.66.0=vc14_1 >> > > - bzip2=1.0.6=vc14_1 >> > > - hdf5=1.10.2=vc14_0 >> > > - lzo=2.10=vc14_0 >> > > - parquet-cpp=1.4.0=vc14_0 >> > > - snappy=1.1.7=vc14_1 >> > > - zlib=1.2.11=vc14_0 >> > > - blas=1.0=mkl >> > > - blosc=1.14.3=he51fdeb_0 >> > > - cython=0.28.3=py36hfa6e2cd_0 >> > > - icc_rt=2017.0.4=h97af966_0 >> > > - intel-openmp=2018.0.3=0 >> > > - numexpr=2.6.5=py36hcd2f87e_0 >> > > - numpy=1.14.5=py36h9fa60d3_2 >> > > - numpy-base=1.14.5=py36h5c71026_2 >> > > - pandas=0.23.1=py36h830ac7b_0 >> > > - pyarrow=0.9.0=py36hfe5e424_2 >> > > - pytables=3.4.4=py36he6f6034_0 >> > > - python=3.6.6=hea74fb7_0 >> > > - vc=14=h0510ff6_3 >> > > - vs2015_runtime=14.0.25123=3 >> > > >> > > >> > >>