Re: C++ Write Schema with RecordBatchStreamWriter

Wes McKinney Mon, 15 Jun 2020 09:15:29 -0700

What compiler are you using?

In 0.16.0 (what you said you were targeting, though it would be better
for you to upgrade to 0.17.1) schema is written in the CheckStarted
function here


https://github.com/apache/arrow/blob/apache-arrow-0.16.0/cpp/src/arrow/ipc/writer.cc#L972

Status CheckStarted() {
  if (!started_) {
    return Start();
  }
  return Status::OK();
}

started_ is set to false by a default member initializer in the
protected block. Maybe you should set a breakpoint in this function
and see if for some reason started_ is true on the first invocation
(in which case it makes me wonder if there is something
not-fully-C++11-compliant about your toolchain).

Otherwise I'm a bit stumped since there are lots of production
applications that use this code.

On Mon, Jun 15, 2020 at 11:01 AM Rares Vernica <rvern...@gmail.com> wrote:
>
> Sure, here is briefly what I'm doing:
>
>     bool append = false;
>     std::shared_ptr<arrow::io::OutputStream> arrowStream;
>     auto arrowResult = arrow::io::FileOutputStream::Open(fileName, append);
>     arrowStream = arrowResult.ValueOrDie();
>
>     std::shared_ptr<arrow::ipc::RecordBatchWriter> arrowWriter;
>     std::shared_ptr<arrow::RecordBatch> arrowBatch;
>     std::shared_ptr<arrow::RecordBatchReader> arrowReader;
>
>     std::shared_ptr<arrow::Schema> arrowSchema = attributes2ArrowSchema(
>             inputSchema, settings.isAttsOnly());
>     ARROW_RETURN_NOT_OK(
>             arrow::ipc::RecordBatchStreamWriter::Open(
>                 arrowStream.get(), arrowSchema, &arrowWriter));
>
>     // Setup "arrowReader" using BufferReader and RecordBatchStreamReader
>     ARROW_RETURN_NOT_OK(arrowReader->ReadNext(&arrowBatch));
>     ARROW_RETURN_NOT_OK(
>                 arrowWriter->WriteRecordBatch(*arrowBatch));
>     ARROW_RETURN_NOT_OK(arrowWriter->Close());
>     ARROW_RETURN_NOT_OK(arrowStream->Close());
>
> On Mon, Jun 15, 2020 at 6:26 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > Can you show the code you are writing? The first thing the stream writer
> > does before writing any record batch is write the schema. It sounds like
> > you are using arrow::ipc::WriteRecordBatch somewhere.
> >
> > On Sun, Jun 14, 2020, 11:44 PM Rares Vernica <rvern...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I have a RecordBatch that I would like to write to a file. I'm using
> > > FileOutputStream::Open to open the file and RecordBatchStreamWriter::Open
> > > to open the stream. I write a record batch with WriteRecordBatch.
> > Finally,
> > > I close the RecordBatchWriter and OutputStream.
> > >
> > > The resulting file size is exactly the size of the Buffer used to store
> > the
> > > RecordBatch. It looks like it is missing the schema. When I try to open
> > the
> > > resulting file from PyArrow I get:
> > >
> > > >>> pa.ipc.open_file('/tmp/1')
> > > pyarrow.lib.ArrowInvalid: File is too small: 6
> > >
> > > $ ll /tmp/1
> > > -rw-r--r--. 1 root root 720 Jun 15 03:54 /tmp/1
> > >
> > > How can I write the schema as well?
> > >
> > > I was browsing the documentation at
> > > https://arrow.apache.org/docs/cpp/index.html but I can't locate any C++
> > > documentation about RecordBatchStreamWriter or RecordBatchWriter. Is this
> > > intentional?
> > >
> > > Thank you!
> > > Rares
> > >
> >

Re: C++ Write Schema with RecordBatchStreamWriter

Reply via email to