I believe the concern is that reading a record batch from a RecordBatchStreamReader triggers the MADV_WILLNEED advice to be sent to the OS before any data is accessed (and regardless of whether or not that data is ever accessed).
I'm pretty sure the `RecordBatchStreamReader` uses `MemoryMappedFile::ReadAt` and that function triggers the MADV_WILLNEED[1]. This is contrary to the user expectation that only the data actually accessed would be loaded into memory. [1] https://github.com/apache/arrow/blob/ca2f4d68e834e600852d5af36dc2190741e33118/cpp/src/arrow/io/file.cc#L677 On Tue, Jan 28, 2025 at 7:15 AM Aldrin <octalene....@pm.me> wrote: > > Then you should just use a memory-mapped file. > > Unless I'm misunderstanding their original message, I believe they are > using a memory-mapped file. I'm not sure if other suggestions helped > address the issue, but my understanding was that they were somehow > triggering reads against the whole file anyways. > > > I'm not sure why a Table is necessary (presumably some useful method in > the API?) if accesses are sparse relative to the entire table; that sounds > more aligned to RecordBatch access. I would think that any use of a Table > method is going to trigger reads to every batch. I would also think that > this scenario has 2 opportunities to do processing without triggering a > scan of the whole file: > 1. when a RecordBatch is read into memory > 2. on the RecordBatches accumulated so far (a Table instance can be > constructed from them without copies, I am pretty sure) > > I have little experience with mmap, so I don't have any particular > thoughts there. Some extra information about how random access into the > table occurs would be helpful, though. > > > > Sent from Proton Mail <https://proton.me/mail/home> for iOS > > > On Tue, Jan 28, 2025 at 01:14, Antoine Pitrou < anto...@python.org > <On+Tue,+Jan+28,+2025+at+01:14,+Antoine+Pitrou+%3C%3Ca+href=>> wrote: > > On Sun, 26 Jan 2025 10:48:48 -0800 > Sharvil Nanavati <shar...@lmnt.com> wrote: > > In a different context, fetching batches one-by-one would be a good way > to > > control when the disk read takes place. > > > > In my context, I'm looking for a way to construct a Table without > > performing the bulk of the IO operations until the memory is accessed. I > > need random access to the table and my accesses are often sparse > relative > > to the size of the entire table. Obviously there has to be *some* IO to > > read the schema and offsets, but that's tiny relative to the data > itself. > > Then you should just use a memory-mapped file. > > Regards > > Antoine. > > >