On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou <anto...@python.org> wrote:
> > Le 06/02/2020 à 16:26, Wes McKinney a écrit : > > > > This seems useful, too. It becomes a question of where do you want to > > manage the cached memory segments, however you obtain them. I'm > > arguing that we should not have much custom code in the Parquet > > library to manage the prefetched segments (and providing the correct > > buffer slice to each column reader when they need it), and instead > > encapsulate this logic so it can be reused. > > I see, so RandomAccessFile would have some associative caching logic to > find whether the exact requested range was cached and then return it to > the caller? That sounds doable. How is lifetime handled then? Are > cached buffers kept on the RandomAccessFile until they are requested, at > which point their ownership is transferred to the caller? > This seems like too much to try to build into RandomAccessFile. I would suggest a class that wraps a random access file and manages cached segments and their lifetimes through explicit APIs. Where to put the "async multiple range request" API is a separate question, though. Probably makes sense to start writing some working code and sort it out there. > Regards > > Antoine. >