On Tue, Nov 17, 2020 at 5:41 PM Rares Vernica <rvern...@gmail.com> wrote: > > Hi Antoine, > > On Tue, Nov 17, 2020 at 2:34 AM Antoine Pitrou <anto...@python.org> wrote: > > > > Le 17/11/2020 à 03:34, Rares Vernica a écrit : > > > > > > I'm using an arrow::io::BufferReader and > > > arrow::ipc::RecordBatchStreamReader to read an arrow::RecordBatch from a > > > file. There is only one batc in the file so I do a single > > > RecordBatchStreamReader::ReadNext call. I store the populated > RecordBatch > > > in memory for reuse (cache). The memory buffer wrapped by the > BufferReader > > > is reallocated. > > > > What do you mean with "reallocated"? As long as you keep a strong > > reference to a RecordBatch (through shared_ptr), the buffers are kept > > intact. This is an intended consequence of the Buffer design and the > > pervasive use of shared_ptr. > > I have something like this: > > std::unique_ptr<char[]> data; > data = std::make_unique<char[]>(...); > // populate data > > std::shared_ptr<arrow::io::BufferReader> bufferReader; > std::shared_ptr<arrow::RecordBatchReader> batchReader; > std::shared_ptr<arrow::RecordBatch> batch; > > bufferReader = std::make_shared<arrow::io::BufferReader>(data.get()); > arrow::ipc::RecordBatchStreamReader::Open(bufferReader, &batchReader); // > Arrow < 0.17.0 > batchReader->ReadNext(&batch); > > data = std::make_unique<char[]>(...); > // populate "data" > > Is "batch" still a valid RecordBatch after the "data" buffer has been > relocated and repopulated?
No, because "bufferReader" cannot reason about memory ownership. If you want to extend the lifetime of some foreign data source, one approach is to create a subclass of Buffer. The only real alternative I can think of would be to copy the data from "data" into a Buffer allocated from a MemoryPool before instantiating the BufferReader > Thanks! > Rares