On Tue, Nov 17, 2020 at 5:41 PM Rares Vernica <rvern...@gmail.com> wrote:
>
> Hi Antoine,
>
> On Tue, Nov 17, 2020 at 2:34 AM Antoine Pitrou <anto...@python.org> wrote:
> >
> > Le 17/11/2020 à 03:34, Rares Vernica a écrit :
> > >
> > > I'm using an arrow::io::BufferReader and
> > > arrow::ipc::RecordBatchStreamReader to read an arrow::RecordBatch from a
> > > file. There is only one batc in the file so I do a single
> > > RecordBatchStreamReader::ReadNext call. I store the populated
> RecordBatch
> > > in memory for reuse (cache). The memory buffer wrapped by the
> BufferReader
> > > is reallocated.
> >
> > What do you mean with "reallocated"?  As long as you keep a strong
> > reference to a RecordBatch (through shared_ptr), the buffers are kept
> > intact.  This is an intended consequence of the Buffer design and the
> > pervasive use of shared_ptr.
>
> I have something like this:
>
> std::unique_ptr<char[]> data;
> data = std::make_unique<char[]>(...);
> // populate data
>
> std::shared_ptr<arrow::io::BufferReader> bufferReader;
> std::shared_ptr<arrow::RecordBatchReader> batchReader;
> std::shared_ptr<arrow::RecordBatch> batch;
>
> bufferReader = std::make_shared<arrow::io::BufferReader>(data.get());
> arrow::ipc::RecordBatchStreamReader::Open(bufferReader, &batchReader); //
> Arrow < 0.17.0
> batchReader->ReadNext(&batch);
>
> data = std::make_unique<char[]>(...);
> // populate "data"
>
> Is "batch" still a valid RecordBatch after the "data" buffer has been
> relocated and repopulated?

No, because "bufferReader" cannot reason about memory ownership. If
you want to extend the lifetime of some foreign data source, one
approach is to create a subclass of Buffer. The only real alternative
I can think of would be to copy the data from "data" into a Buffer
allocated from a MemoryPool before instantiating the BufferReader

> Thanks!
> Rares

Reply via email to