Gabe Joseph created ARROW-4677:
----------------------------------

             Summary: [Python] serialization does not consider ndarray 
endianness
                 Key: ARROW-4677
                 URL: https://issues.apache.org/jira/browse/ARROW-4677
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
         Environment: * pyarrow 0.12.1
* numpy 1.16.1
* Python 3.7.0
* Intel Core i7-7820HQ
* (macOS 10.13.6)
            Reporter: Gabe Joseph


{{pa.serialize}} does not appear to properly encode the endianness of 
multi-byte data:
{code}
# roundtrip.py 
import numpy as np
import pyarrow as pa

arr = np.array([1], dtype=np.dtype('>i2'))

buf = pa.serialize(arr).to_buffer()
result = pa.deserialize(buf)

print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}")
np.testing.assert_array_equal(arr, result)
{code}
{code}
$ pipenv run python roundtrip.py
Original: >i2, deserialized: <i2
Traceback (most recent call last):
  File "roundtrip.py", line 10, in <module>
    np.testing.assert_array_equal(arr, result)
  File 
"/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py",
 line 896, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File 
"/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py",
 line 819, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

Mismatch: 100%
Max absolute difference: 255
Max relative difference: 0.99609375
 x: array([1], dtype=int16)
 y: array([256], dtype=int16)
{code}

The data of the deserialized array is identical (big-endian), but the dtype 
Arrow assigns to it doesn't reflect its endianness (presumably uses the system 
endianness, which is little).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to