Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-16 Thread Rares Vernica
Thanks a lot, Wes! That was the issue. Good catch! On Tue, Jun 16, 2020 at 9:39 AM Wes McKinney wrote: > It looks like on Python 2.7 that the open_stream/open_file functions > are treating the file name that you are passing as a binary buffer > rather than a file path (inferring from the fact th

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-16 Thread Wes McKinney
It looks like on Python 2.7 that the open_stream/open_file functions are treating the file name that you are passing as a binary buffer rather than a file path (inferring from the fact that '1' is one byte in Py2.7 and 'foo' is 3 bytes). Try passing an open file handle instead On Tue, Jun 16, 2020

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-16 Thread Rares Vernica
Thank you for your help in getting to the bottom of this. It seems that there is no problem with the C++ code, but the PyArrow/Python 2.7 combination. Here are more details. I have two C++ programs writing two Arrow files. The first one is the bigger plugin I'm attempting to port and the second o

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Micah Kornfield
Hi Rares, This last issue sounds like you are trying to write data from 0.16.0 version of the library and read it from a pre-0.15.0 version of the python library. If you want to do this you need to set "bool write_legacy_ipc_format" to true on IpcWriterOptions/IpcOptions object and construct the

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
With open_stream I get a different error: > python -c "import pyarrow; pyarrow.ipc.open_stream('/tmp/foo')" Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/pyarrow/ipc.py", line 137, in open_stream return RecordBatchStreamReader(source)

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
On Mon, Jun 15, 2020 at 11:24 PM Rares Vernica wrote: > > I was able to reproduce my issue in a small, fully-contained, program. Here > is the source code: > > #include > #include > #include > #include > > arrow::Status foo() { > std::shared_ptr arrowStream; > std::shared_ptr arrowWriter;

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
I was able to reproduce my issue in a small, fully-contained, program. Here is the source code: #include #include #include #include arrow::Status foo() { std::shared_ptr arrowStream; std::shared_ptr arrowWriter; std::shared_ptr arrowBatch; std::shared_ptr arrowReader; std::vector>

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
This is the compiler: > g++ --version g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 And this is how I compile the code: g++ -W -Wextra -Wall -Wno-unused-parameter -Wno-variadic-macros -Wno-strict-aliasing -Wno-long-long -Wno-unused -fPIC -D_STDC_FORMAT_MACROS -Wno-system-headers -O3 -g -DN

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
What compiler are you using? In 0.16.0 (what you said you were targeting, though it would be better for you to upgrade to 0.17.1) schema is written in the CheckStarted function here https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/writer.cc#L972 Status CheckStarted() {

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Rares Vernica
Sure, here is briefly what I'm doing: bool append = false; std::shared_ptr arrowStream; auto arrowResult = arrow::io::FileOutputStream::Open(fileName, append); arrowStream = arrowResult.ValueOrDie(); std::shared_ptr arrowWriter; std::shared_ptr arrowBatch; std::shared_

Re: C++ Write Schema with RecordBatchStreamWriter

2020-06-15 Thread Wes McKinney
Can you show the code you are writing? The first thing the stream writer does before writing any record batch is write the schema. It sounds like you are using arrow::ipc::WriteRecordBatch somewhere. On Sun, Jun 14, 2020, 11:44 PM Rares Vernica wrote: > Hello, > > I have a RecordBatch that I wou

C++ Write Schema with RecordBatchStreamWriter

2020-06-14 Thread Rares Vernica
Hello, I have a RecordBatch that I would like to write to a file. I'm using FileOutputStream::Open to open the file and RecordBatchStreamWriter::Open to open the stream. I write a record batch with WriteRecordBatch. Finally, I close the RecordBatchWriter and OutputStream. The resulting file size