Gabe Joseph created ARROW-4677: ---------------------------------- Summary: [Python] serialization does not consider ndarray endianness Key: ARROW-4677 URL: https://issues.apache.org/jira/browse/ARROW-4677 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1 Environment: * pyarrow 0.12.1 * numpy 1.16.1 * Python 3.7.0 * Intel Core i7-7820HQ * (macOS 10.13.6) Reporter: Gabe Joseph
{{pa.serialize}} does not appear to properly encode the endianness of multi-byte data: {code} # roundtrip.py import numpy as np import pyarrow as pa arr = np.array([1], dtype=np.dtype('>i2')) buf = pa.serialize(arr).to_buffer() result = pa.deserialize(buf) print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}") np.testing.assert_array_equal(arr, result) {code} {code} $ pipenv run python roundtrip.py Original: >i2, deserialized: <i2 Traceback (most recent call last): File "roundtrip.py", line 10, in <module> np.testing.assert_array_equal(arr, result) File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal Mismatch: 100% Max absolute difference: 255 Max relative difference: 0.99609375 x: array([1], dtype=int16) y: array([256], dtype=int16) {code} The data of the deserialized array is identical (big-endian), but the dtype Arrow assigns to it doesn't reflect its endianness (presumably uses the system endianness, which is little). -- This message was sent by Atlassian JIRA (v7.6.3#76005)