I believe the concern is that reading a record batch from a
RecordBatchStreamReader triggers the MADV_WILLNEED advice to be sent to the
OS before any data is accessed (and regardless of whether or not that data
is ever accessed).

I'm pretty sure the `RecordBatchStreamReader` uses
`MemoryMappedFile::ReadAt` and that function triggers the
MADV_WILLNEED[1].  This is contrary to the user expectation that only the
data actually accessed would be loaded into memory.

[1]
https://github.com/apache/arrow/blob/ca2f4d68e834e600852d5af36dc2190741e33118/cpp/src/arrow/io/file.cc#L677

On Tue, Jan 28, 2025 at 7:15 AM Aldrin <octalene....@pm.me> wrote:

> > Then you should just use a memory-mapped file.
>
> Unless I'm misunderstanding their original message, I believe they are
> using a memory-mapped file. I'm not sure if other suggestions helped
> address the issue, but my understanding was that they were somehow
> triggering reads against the whole file anyways.
>
>
> I'm not sure why a Table is necessary (presumably some useful method in
> the API?) if accesses are sparse relative to the entire table; that sounds
> more aligned to RecordBatch access. I would think that any use of a Table
> method is going to trigger reads to every batch. I would also think that
> this scenario has 2 opportunities to do processing without triggering a
> scan of the whole file:
> 1. when a RecordBatch is read into memory
> 2. on the RecordBatches accumulated so far (a Table instance can be
> constructed from them without copies, I am pretty sure)
>
> I have little experience with mmap, so I don't have any particular
> thoughts there. Some extra information about how random access into the
> table occurs would be helpful, though.
>
>
>
> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>
>
> On Tue, Jan 28, 2025 at 01:14, Antoine Pitrou < anto...@python.org
> <On+Tue,+Jan+28,+2025+at+01:14,+Antoine+Pitrou+%3C%3Ca+href=>> wrote:
>
> On Sun, 26 Jan 2025 10:48:48 -0800
> Sharvil Nanavati <shar...@lmnt.com> wrote:
> > In a different context, fetching batches one-by-one would be a good way
> to
> > control when the disk read takes place.
> >
> > In my context, I'm looking for a way to construct a Table without
> > performing the bulk of the IO operations until the memory is accessed. I
> > need random access to the table and my accesses are often sparse
> relative
> > to the size of the entire table. Obviously there has to be *some* IO to
> > read the schema and offsets, but that's tiny relative to the data
> itself.
>
> Then you should just use a memory-mapped file.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to