Hi Wes, Thanks for your answer. I finally got to test this out. To recap, I'm writing Arrow files from C++ using Arrow 0.9.0.
Then, I'm trying to read these files from Python. I tried Python 2.7.15 and PyArrow 0.10.0 to 0.13.0. In all these cases I get an error. (PyArrow 0.9.0 works fine, as expected) > python2 -c "import pyarrow; pyarrow.ipc.open_stream('/tmp/foo').read_all()" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line 123, in open_stream return RecordBatchStreamReader(source) File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line 58, in __init__ self._open(source) File "pyarrow/ipc.pxi", line 312, in pyarrow.lib._RecordBatchStreamReader._open File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Expected to read 1886221359 metadata bytes, but only read 8 > python2 -c "import pyarrow; pyarrow.RecordBatchStreamReader('/tmp/foo').read_all()" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/foo/.local/lib/python2.7/site-packages/pyarrow/ipc.py", line 58, in __init__ self._open(source) File "pyarrow/ipc.pxi", line 312, in pyarrow.lib._RecordBatchStreamReader._open File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Expected to read 1886221359 metadata bytes, but only read 8 On the other hand on Python 3 they all these cases work fine. Thanks! Rares On Mon, Mar 11, 2019 at 7:16 AM Wes McKinney <wesmck...@gmail.com> wrote: > hi Rares -- IPC messages produced by 0.9.0 should be forward > compatible. I opened https://issues.apache.org/jira/browse/ARROW-921 > some time ago about adding some tools to integration test one version > versus another to obtain hard proof of this, but this work has not > been completed yet (any takers?). > > Have you encountered any problems? > > Thanks, > Wes > > On Sun, Mar 10, 2019 at 11:49 PM Rares Vernica <rvern...@gmail.com> wrote: > > > > Hello, > > > > I have a C++ library using Arrow 0.9.0 to serialize data The code looks > > like this: > > > > std::shared_ptr<arrow::RecordBatch> arrowBatch; > > arrowBatch = arrow::RecordBatch::Make(_arrowSchema, nCells, > _arrowArrays); > > > > std::shared_ptr<arrow::PoolBuffer> arrowBuffer(new > > arrow::PoolBuffer(_arrowPool)); > > arrow::io::BufferOutputStream arrowStream(arrowBuffer); > > > > std::shared_ptr<arrow::ipc::RecordBatchWriter> arrowWriter; > > arrow::ipc::RecordBatchStreamWriter::Open(&arrowStream, _arrowSchema, > > &arrowWriter); > > > > arrowWriter->WriteRecordBatch(*arrowBatch); > > ... > > reinterpret_cast<const char*>(arrowBuffer->data()), arrowBuffer->size()) > > ... > > > > The output bytes are then read in Python using pyarrow: > > > > pyarrow.RecordBatchStreamReader(pyarrow.BufferReader(buf)).read_pandas() > > > > Since the C++ side uses Arrow 0.9.0 I have been using pyarrow==0.9.0. > When > > using Python 3.7, getting pyarrow=0.9.0 is not easy since there are no > > per-compiled .whl packages on PyPI. > > > > I wonder if I could use newer pyarrow versions to parse the Arrow 0.9.0 > > ouput? Is the format compatible? > > > > Thanks! > > Rares >