[ https://issues.apache.org/jira/browse/ARROW-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17661699#comment-17661699 ]
Rok Mihevc commented on ARROW-4677: ----------------------------------- This issue has been migrated to [issue #21207|https://github.com/apache/arrow/issues/21207] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] serialization does not consider ndarray endianness > ----------------------------------------------------------- > > Key: ARROW-4677 > URL: https://issues.apache.org/jira/browse/ARROW-4677 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.1 > Environment: * pyarrow 0.12.1 > * numpy 1.16.1 > * Python 3.7.0 > * Intel Core i7-7820HQ > * (macOS 10.13.6) > Reporter: Gabe Joseph > Priority: Minor > Labels: pyarrow-serialization > > {{pa.serialize}} does not appear to properly encode the endianness of > multi-byte data: > {code} > # roundtrip.py > import numpy as np > import pyarrow as pa > arr = np.array([1], dtype=np.dtype('>i2')) > buf = pa.serialize(arr).to_buffer() > result = pa.deserialize(buf) > print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}") > np.testing.assert_array_equal(arr, result) > {code} > {code} > $ pipenv run python roundtrip.py > Original: >i2, deserialized: <i2 > Traceback (most recent call last): > File "roundtrip.py", line 10, in <module> > np.testing.assert_array_equal(arr, result) > File > "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", > line 896, in assert_array_equal > verbose=verbose, header='Arrays are not equal') > File > "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", > line 819, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not equal > Mismatch: 100% > Max absolute difference: 255 > Max relative difference: 0.99609375 > x: array([1], dtype=int16) > y: array([256], dtype=int16) > {code} > The data of the deserialized array is identical (big-endian), but the dtype > Arrow assigns to it doesn't reflect its endianness (presumably uses the > system endianness, which is little). -- This message was sent by Atlassian Jira (v8.20.10#820010)