[ 
https://issues.apache.org/jira/browse/ARROW-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-5260:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/21731

> [Python][C++] Crash when deserializing from components in a fresh new process
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-5260
>                 URL: https://issues.apache.org/jira/browse/ARROW-5260
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.12.0, 0.12.1, 0.13.0
>            Reporter: Yevgeni Litvin
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Trying to deserialize a table from component in a fresh new process crashes 
> with sigsegv:
> {noformat}
> #1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, 
> std::shared_ptr<arrow::Buffer>*) ()
> from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
> #2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, 
> int, _object*, arrow::py::SerializedPyObject*) () from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
> #3 0x00007fffd6b1cafe in 
> __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*, 
> _object*, _object*) () from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #4 0x00000000004ad919 in PyCFunction_Call ()
> #5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) 
> [clone .constprop.1186] ()
> from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
> from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
> from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #8 0x00007fffd6ab087f in 
> __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, 
> _object*) ()
> from 
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
> #10 0x0000000000545e34 in ?? ()
> #11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
> #12 0x0000000000545a51 in ?? ()
> #13 0x0000000000546890 in PyEval_EvalCode ()
> #14 0x000000000042a9a8 in PyRun_FileExFlags ()
> #15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
> #16 0x000000000043e0ba in Py_Main ()
> #17 0x0000000000421b04 in main ()
> {noformat}
>  The following snippet can be used to reproduce the issue:
> {code:java}
> import pickle
> import sys
> import pandas as pd
> import pyarrow as pa
> if __name__ == '__main__':
>     if sys.argv[1] == 'w':
>         df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
>         table = pa.Table.from_pandas(df)
>         table_serialized = pa.serialize(table)
>         table_serialized_components = table_serialized.to_components()
>         with open('/tmp/p.pickle', 'wb') as f:
>             pickle.dump(table_serialized_components, f)
>         print('/tmp/p.pickle written ok')
>     if sys.argv[1] == 'r':
>         # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH
>         #pa.serialize(0)
>         with open('/tmp/p.pickle', 'rb') as f:
>             table_serialized_components = pickle.load(f)
>         table = pa.deserialize_components(table_serialized_components)
>         print(table)
> {code}
> Then run:
> {code:java}
> $ python pa_serialization_crashes.py w
> /tmp/p.pickle written ok
> $ python pa_serialization_crashes.py r
> Segmentation fault (core dumped){code}
> The crash would not occur if you try to serialize unrelated data before the 
> deserialization (see a commented out line in the reproduction instructions)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to