This is the compiler: > g++ --version g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
And this is how I compile the code: g++ -W -Wextra -Wall -Wno-unused-parameter -Wno-variadic-macros -Wno-strict-aliasing -Wno-long-long -Wno-unused -fPIC -D_STDC_FORMAT_MACROS -Wno-system-headers -O3 -g -DNDEBUG -D_STDC_LIMIT_MACROS -fno-omit-frame-pointer -std=c++14 -DCPP11 -DARROW_NO_DEPRECATED_API -DUSE_ARROW -I. -DPROJECT_ROOT="\"/opt/scidb/19.11\"" -I"/opt/scidb/19.11/3rdparty/boost/include/" -I"/opt/scidb/19.11/include" -c PhysicalAioSave.cpp -o PhysicalAioSave.o g++ -W -Wextra -Wall -Wno-unused-parameter -Wno-variadic-macros -Wno-strict-aliasing -Wno-long-long -Wno-unused -fPIC -D_STDC_FORMAT_MACROS -Wno-system-headers -O3 -g -DNDEBUG -D_STDC_LIMIT_MACROS -fno-omit-frame-pointer -std=c++14 -DCPP11 -DARROW_NO_DEPRECATED_API -DUSE_ARROW -I. -DPROJECT_ROOT="\"/opt/scidb/19.11\"" -I"/opt/scidb/19.11/3rdparty/boost/include/" -I"/opt/scidb/19.11/include" -o libaccelerated_io_tools.so plugin.o LogicalSplit.o PhysicalSplit.o LogicalParse.o PhysicalParse.o LogicalAioInput.o PhysicalAioInput.o LogicalAioSave.o PhysicalAioSave.o Functions.o -shared -Wl,-soname,libaccelerated_io_tools.so -L. -L"/opt/scidb/19.11/3rdparty/boost/lib" -L"/opt/scidb/19.11/lib" -Wl,-rpath,/opt/scidb/19.11/lib -lm -larrow We targeted 0.16.0 because we are still stuck on Python 2.7 and PyPI still has PyArrow binaries for 2.7. Anyway, I temporarily upgraded to 0.17.1 but the result is the same. I also fixed all the deprecation warnings but that did not help either. Setting a breakpoint might be a challenge since this code runs as a plug-in, but I'll try to isolate this further. Thanks! Rares On Mon, Jun 15, 2020 at 9:15 AM Wes McKinney <wesmck...@gmail.com> wrote: > What compiler are you using? > > In 0.16.0 (what you said you were targeting, though it would be better > for you to upgrade to 0.17.1) schema is written in the CheckStarted > function here > > > https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/writer.cc#L972 > > Status CheckStarted() { > if (!started_) { > return Start(); > } > return Status::OK(); > } > > started_ is set to false by a default member initializer in the > protected block. Maybe you should set a breakpoint in this function > and see if for some reason started_ is true on the first invocation > (in which case it makes me wonder if there is something > not-fully-C++11-compliant about your toolchain). > > Otherwise I'm a bit stumped since there are lots of production > applications that use this code. > > On Mon, Jun 15, 2020 at 11:01 AM Rares Vernica <rvern...@gmail.com> wrote: > > > > Sure, here is briefly what I'm doing: > > > > bool append = false; > > std::shared_ptr<arrow::io::OutputStream> arrowStream; > > auto arrowResult = arrow::io::FileOutputStream::Open(fileName, > append); > > arrowStream = arrowResult.ValueOrDie(); > > > > std::shared_ptr<arrow::ipc::RecordBatchWriter> arrowWriter; > > std::shared_ptr<arrow::RecordBatch> arrowBatch; > > std::shared_ptr<arrow::RecordBatchReader> arrowReader; > > > > std::shared_ptr<arrow::Schema> arrowSchema = attributes2ArrowSchema( > > inputSchema, settings.isAttsOnly()); > > ARROW_RETURN_NOT_OK( > > arrow::ipc::RecordBatchStreamWriter::Open( > > arrowStream.get(), arrowSchema, &arrowWriter)); > > > > // Setup "arrowReader" using BufferReader and RecordBatchStreamReader > > ARROW_RETURN_NOT_OK(arrowReader->ReadNext(&arrowBatch)); > > ARROW_RETURN_NOT_OK( > > arrowWriter->WriteRecordBatch(*arrowBatch)); > > ARROW_RETURN_NOT_OK(arrowWriter->Close()); > > ARROW_RETURN_NOT_OK(arrowStream->Close()); > > > > On Mon, Jun 15, 2020 at 6:26 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > Can you show the code you are writing? The first thing the stream > writer > > > does before writing any record batch is write the schema. It sounds > like > > > you are using arrow::ipc::WriteRecordBatch somewhere. > > > > > > On Sun, Jun 14, 2020, 11:44 PM Rares Vernica <rvern...@gmail.com> > wrote: > > > > > > > Hello, > > > > > > > > I have a RecordBatch that I would like to write to a file. I'm using > > > > FileOutputStream::Open to open the file and > RecordBatchStreamWriter::Open > > > > to open the stream. I write a record batch with WriteRecordBatch. > > > Finally, > > > > I close the RecordBatchWriter and OutputStream. > > > > > > > > The resulting file size is exactly the size of the Buffer used to > store > > > the > > > > RecordBatch. It looks like it is missing the schema. When I try to > open > > > the > > > > resulting file from PyArrow I get: > > > > > > > > >>> pa.ipc.open_file('/tmp/1') > > > > pyarrow.lib.ArrowInvalid: File is too small: 6 > > > > > > > > $ ll /tmp/1 > > > > -rw-r--r--. 1 root root 720 Jun 15 03:54 /tmp/1 > > > > > > > > How can I write the schema as well? > > > > > > > > I was browsing the documentation at > > > > https://arrow.apache.org/docs/cpp/index.html but I can't locate any > C++ > > > > documentation about RecordBatchStreamWriter or RecordBatchWriter. Is > this > > > > intentional? > > > > > > > > Thank you! > > > > Rares > > > > > > > >