Hi Wes,

Thanks for your answer. I finally got to test this out. To recap, I'm
writing Arrow files from C++ using Arrow 0.9.0.

Then, I'm trying to read these files from Python. I tried Python 2.7.15 and
PyArrow 0.10.0 to 0.13.0. In all these cases I get an error. (PyArrow 0.9.0
works fine, as expected)

> python2 -c "import pyarrow;
pyarrow.ipc.open_stream('/tmp/foo').read_all()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line
123, in open_stream
    return RecordBatchStreamReader(source)
  File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line
58, in __init__
    self._open(source)
  File "pyarrow/ipc.pxi", line 312, in
pyarrow.lib._RecordBatchStreamReader._open
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected to read 1886221359 metadata bytes, but
only read 8

> python2 -c "import pyarrow;
pyarrow.RecordBatchStreamReader('/tmp/foo').read_all()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line
58, in __init__
    self._open(source)
  File "pyarrow/ipc.pxi", line 312, in
pyarrow.lib._RecordBatchStreamReader._open
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected to read 1886221359 metadata bytes, but
only read 8

On the other hand on Python 3 they all these cases work fine.

Thanks!
Rares


On Mon, Mar 11, 2019 at 7:16 AM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Rares -- IPC messages produced by 0.9.0 should be forward
> compatible. I opened https://issues.apache.org/jira/browse/ARROW-921
> some time ago about adding some tools to integration test one version
> versus another to obtain hard proof of this, but this work has not
> been completed yet (any takers?).
>
> Have you encountered any problems?
>
> Thanks,
> Wes
>
> On Sun, Mar 10, 2019 at 11:49 PM Rares Vernica <rvern...@gmail.com> wrote:
> >
> > Hello,
> >
> > I have a C++ library using Arrow 0.9.0 to serialize data The code looks
> > like this:
> >
> > std::shared_ptr<arrow::RecordBatch> arrowBatch;
> > arrowBatch = arrow::RecordBatch::Make(_arrowSchema, nCells,
> _arrowArrays);
> >
> > std::shared_ptr<arrow::PoolBuffer> arrowBuffer(new
> > arrow::PoolBuffer(_arrowPool));
> > arrow::io::BufferOutputStream arrowStream(arrowBuffer);
> >
> > std::shared_ptr<arrow::ipc::RecordBatchWriter> arrowWriter;
> > arrow::ipc::RecordBatchStreamWriter::Open(&arrowStream, _arrowSchema,
> > &arrowWriter);
> >
> > arrowWriter->WriteRecordBatch(*arrowBatch);
> > ...
> > reinterpret_cast<const char*>(arrowBuffer->data()), arrowBuffer->size())
> > ...
> >
> > The output bytes are then read in Python using pyarrow:
> >
> > pyarrow.RecordBatchStreamReader(pyarrow.BufferReader(buf)).read_pandas()
> >
> > Since the C++ side uses Arrow 0.9.0 I have been using pyarrow==0.9.0.
> When
> > using Python 3.7, getting pyarrow=0.9.0 is not easy since there are no
> > per-compiled .whl packages on PyPI.
> >
> > I wonder if I could use newer pyarrow versions to parse the Arrow 0.9.0
> > ouput? Is the format compatible?
> >
> > Thanks!
> > Rares
>

Reply via email to