On Thu, Feb 6, 2020 at 2:46 AM Antoine Pitrou <solip...@pitrou.net> wrote:
>
> On Wed, 5 Feb 2020 15:46:15 -0600
> Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > I'll comment in more detail on some of the other items in due course,
> > but I think this should be handled by an implementation of
> > RandomAccessFile (that wraps a naked RandomAccessFile) with some
> > additional methods, rather than adding this to the abstract
> > RandomAccessFile interface, e.g.
> >
> > class CachingInputFile : public RandomAccessFile {
> >  public:
> >    CachingInputFile(std::shared_ptr<RandomAccessFile> naked_file);
> >    Status CacheRanges(...);
> > };
> >
> > etc.
>
> IMHO it may be more beneficial to expose it as an asynchronous API on
> RandomAccessFile, for example:
> class RandomAccessFile {
>  public:
>   struct Range {
>     int64_t offset;
>     int64_t length;
>   };
>
>   std::vector<Promise<std::shared_ptr<Buffer>>>
>     ReadRangesAsync(std::vector<Range> ranges);
> };
>
>
> The reason is that some APIs such as the C++ AWS S3 API have their own
> async support, which may be beneficial to use over a generic Arrow
> thread-pool implementation.
>
> Also, by returning a Promise instead of simply caching the results, you
> make it easier to handle the lifetime of the results.

This seems useful, too. It becomes a question of where do you want to
manage the cached memory segments, however you obtain them. I'm
arguing that we should not have much custom code in the Parquet
library to manage the prefetched segments (and providing the correct
buffer slice to each column reader when they need it), and instead
encapsulate this logic so it can be reused.

The API I proposed was just a mockup, I agree it would make sense for
the prefetching to occur asynchronously so that a column reader can
proceed as soon as its coalesced chunk has been prefetched, rather
than having to wait synchronously for all prefetching to complete.

>
> (Promise<T> can be something like std::future<Result<T>>, though
> std::future<> has annoying limitations and we may want to write our own
> instead)
>
> Regards
>
> Antoine.
>
>

Reply via email to