Gabe Joseph created ARROW-4675:
----------------------------------

             Summary: [Python] Error serializing bool ndarray in py2 and 
deserializing in py3
                 Key: ARROW-4675
                 URL: https://issues.apache.org/jira/browse/ARROW-4675
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.0
         Environment: * pyarrow 0.12.0
* numpy 1.16.1
* Python 3.7.0, 2.7.15
* (macOS 10.13.6)
            Reporter: Gabe Joseph


{{np.bool}} is the only dtype I've found that causes this issue. Both empty and 
non-empty arrays cause it.

The issue only manifests from py2 to py3; staying within the same version 
succeeds, as does serializing from py3 and deserializing in py2.

This appears to just be due to Python 2 {{str}} being deserialized in Python 3 
as {{bytes}}; it should be {{unicode}} on the py2 end to come back as {{str}} 
in py3. I suppose something in the serialization implementation is writing the 
dtype (just for bool arrays?) using a {{str}}, but haven't dug into it yet.


{code:bash}
(two)bash-3.2$ python cereal.py
(two)bash-3.2$ cat cereal.py 
# Python 2
import numpy as np
import pyarrow as pa

data = np.array([], dtype=np.dtype('bool'))
buf = pa.serialize(data).to_buffer()

outstream = pa.output_stream("buffer")
outstream.write(buf)
outstream.close()

# ...switch to python 3 venv...
(three)bash-3.2$ cat decereal.py 
# Python 3
import numpy as np
import pyarrow as pa

instream = pa.input_stream("buffer")
buf = instream.read()

data = pa.deserialize(buf)
print(data)
(three)bash-3.2$ python3 decereal.py 
Traceback (most recent call last):
  File "decereal.py", line 10, in <module>
    data = pa.deserialize(buf)
  File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
  File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
  File "pyarrow/serialization.pxi", line 262, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "pyarrow/serialization.pxi", line 175, in 
pyarrow.lib.SerializationContext._deserialize_callback
TypeError: can only concatenate str (not "bytes") to str
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to