[ https://issues.apache.org/jira/browse/ARROW-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-5260: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21731 > [Python][C++] Crash when deserializing from components in a fresh new process > ----------------------------------------------------------------------------- > > Key: ARROW-5260 > URL: https://issues.apache.org/jira/browse/ARROW-5260 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.12.0, 0.12.1, 0.13.0 > Reporter: Yevgeni Litvin > Assignee: Antoine Pitrou > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Trying to deserialize a table from component in a fresh new process crashes > with sigsegv: > {noformat} > #1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, > std::shared_ptr<arrow::Buffer>*) () > from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13 > #2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, > int, _object*, arrow::py::SerializedPyObject*) () from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13 > #3 0x00007fffd6b1cafe in > __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*, > _object*, _object*) () from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so > #4 0x00000000004ad919 in PyCFunction_Call () > #5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) > [clone .constprop.1186] () > from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so > #6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) () > from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so > #7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) () > from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so > #8 0x00007fffd6ab087f in > __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, > _object*) () > from > /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so > #9 0x00000000004adca7 in _PyCFunction_FastCallKeywords () > #10 0x0000000000545e34 in ?? () > #11 0x000000000054ac8c in _PyEval_EvalFrameDefault () > #12 0x0000000000545a51 in ?? () > #13 0x0000000000546890 in PyEval_EvalCode () > #14 0x000000000042a9a8 in PyRun_FileExFlags () > #15 0x000000000042ab8d in PyRun_SimpleFileExFlags () > #16 0x000000000043e0ba in Py_Main () > #17 0x0000000000421b04 in main () > {noformat} > The following snippet can be used to reproduce the issue: > {code:java} > import pickle > import sys > import pandas as pd > import pyarrow as pa > if __name__ == '__main__': > if sys.argv[1] == 'w': > df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']}) > table = pa.Table.from_pandas(df) > table_serialized = pa.serialize(table) > table_serialized_components = table_serialized.to_components() > with open('/tmp/p.pickle', 'wb') as f: > pickle.dump(table_serialized_components, f) > print('/tmp/p.pickle written ok') > if sys.argv[1] == 'r': > # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH > #pa.serialize(0) > with open('/tmp/p.pickle', 'rb') as f: > table_serialized_components = pickle.load(f) > table = pa.deserialize_components(table_serialized_components) > print(table) > {code} > Then run: > {code:java} > $ python pa_serialization_crashes.py w > /tmp/p.pickle written ok > $ python pa_serialization_crashes.py r > Segmentation fault (core dumped){code} > The crash would not occur if you try to serialize unrelated data before the > deserialization (see a commented out line in the reproduction instructions) > -- This message was sent by Atlassian Jira (v8.20.10#820010)