(Apologies for the double-email)
In the original coalescing PR, an "AsyncContext" abstraction was
discussed. I could imagine being able to hold arbitrary
attributes/metrics for tasks that a scheduler could then take into
account, while making it easier for applications to thread through all
the di
Wes,
>From the outline there it seems like a path forward. I think the mode
that would be helpful here is some sort of simple ordering/dependency
so that the scheduler knows not to schedule subtasks of B until all
subtasks of A have started (but not necessarily finished).
I think the other part w
I just wrote up a ticket about a general purpose multi-consumer
scheduler API, do you think this could be the beginning of a
resolution?
https://issues.apache.org/jira/browse/ARROW-8667
We may also want to design in some affordances so that no consumer is
ever 100% blocked, even if that causes te
Francois,
Thanks for the pointers. I'll see if I can put together a
proof-of-concept, might that help discussion? I agree it would be good
to make it format-agnostic. I'm also curious what thoughts you'd have
on how to manage cross-file parallelism (coalescing only helps within
a file). If we just
If we want to discuss IO APIs we should do that comprehensively.
There are various ways of expressing what we want to do (explicit
readahead, fadvise-like APIs, async APIs, etc.).
Regards
Antoine.
Le 30/04/2020 à 15:08, Francois Saint-Jacques a écrit :
> One more point,
>
> It would seem ben
One more point,
It would seem beneficial if we could express this in
`RandomAccessFile::ReadAhead(vector)` method: no async
buffering/coalescing would be needed. In the case of Parquet, we'd get
the _exact_ ranges computed from the medata.This method would also
possibly benefit other filesystems s
Hello David,
I think that what you ask is achievable with the dataset API without
much effort. You'd have to insert the pre-buffering at
ParquetFileFormat::ScanFile [1]. The top-level Scanner::Scan method is
essentially a generator that looks like
flatmap(Iterator>). It consumes the
fragment in-or
Sure, and we are still interested in collaborating. The main use case
we have is scanning datasets in order of the partition key; it seems
ordering is the only missing thing from Antoine's comments. However,
from briefly playing around with the Python API, an application could
manually order the fr
On Thu, 30 Apr 2020 at 04:06, Wes McKinney wrote:
> On Wed, Apr 29, 2020 at 6:54 PM David Li wrote:
> >
> > Ah, sorry, so I am being somewhat unclear here. Yes, you aren't
> > guaranteed to download all the files in order, but with more control,
> > you can make this more likely. You can also pr
On Wed, Apr 29, 2020 at 6:54 PM David Li wrote:
>
> Ah, sorry, so I am being somewhat unclear here. Yes, you aren't
> guaranteed to download all the files in order, but with more control,
> you can make this more likely. You can also prevent the case where due
> to scheduling, file N+1 doesn't eve
Ah, sorry, so I am being somewhat unclear here. Yes, you aren't
guaranteed to download all the files in order, but with more control,
you can make this more likely. You can also prevent the case where due
to scheduling, file N+1 doesn't even start downloading until after
file N+2, which can happen
Le 29/04/2020 à 23:30, David Li a écrit :
> Sure -
>
> The use case is to read a large partitioned dataset, consisting of
> tens or hundreds of Parquet files. A reader expects to scan through
> the data in order of the partition key. However, to improve
> performance, we'd like to begin loading
Sure -
The use case is to read a large partitioned dataset, consisting of
tens or hundreds of Parquet files. A reader expects to scan through
the data in order of the partition key. However, to improve
performance, we'd like to begin loading files N+1, N+2, ... N + k
while the consumer is still re
Le 29/04/2020 à 20:49, David Li a écrit :
>
> However, we noticed this doesn’t actually bring us the expected
> benefits. Consider files A, B, and C being buffered in parallel; right
> now, all I/O goes through an internal I/O pool, and so several
> operations for each of the three files get add
Hi all,
I’d like to follow up on this discussion. Thanks to Antoine, we now
have a read coalescing implementation in-tree which shows clear
performance benefits both when reading plain files and Parquet
files[1]. We now have some follow-up work where we think the design
and implementation could be
Thanks. I've set up an AWS account for my own testing for now. I've
also submitted a PR to add a basic benchmark which can be run
self-contained, against a local Minio instance, or against S3:
https://github.com/apache/arrow/pull/6675
I ran the benchmark from my local machine, and I can test from
On Thu, Mar 19, 2020 at 10:04 AM David Li wrote:
>
> > That's why it's important that we set ourselves up to do performance
> > testing in a realistic environment in AWS rather than simulating it.
>
> For my clarification, what are the plans for this (if any)? I couldn't
> find any prior discussi
> That's why it's important that we set ourselves up to do performance testing
> in a realistic environment in AWS rather than simulating it.
For my clarification, what are the plans for this (if any)? I couldn't
find any prior discussion, though it sounds like the discussion around
cloud CI capa
For us it applies to S3-like systems, not only S3 itself, at least.
It does make sense to limit it to some filesystems. The behavior would
be opt-in at the Parquet reader level, so at the Datasets or
Filesystem layer we can take care of enabling the flag for filesystems
where it actually helps.
I
Le 18/03/2020 à 18:30, David Li a écrit :
>> Instead of S3, you can use the Slow streams and Slow filesystem
>> implementations. It may better protect against varying external conditions.
>
> I think we'd want several different benchmarks - we want to ensure we
> don't regress local filesystem
> Instead of S3, you can use the Slow streams and Slow filesystem
> implementations. It may better protect against varying external conditions.
I think we'd want several different benchmarks - we want to ensure we
don't regress local filesystem performance, and we also want to
measure in an actu
On Wed, Mar 18, 2020 at 11:42 AM Antoine Pitrou wrote:
>
>
> Le 18/03/2020 à 17:36, David Li a écrit :
> > Hi all,
> >
> > Thanks to Antoine for implementing the core read coalescing logic.
> >
> > We've taken a look at what else needs to be done to get this working,
> > and it sounds like the fol
hi David,
Yes, this sounds right to me. I would say that we should come up with
the public API for column prebuffering ASAP and then get to work on
implementing it and working to maximize the throughput.
- Wes
On Wed, Mar 18, 2020 at 11:37 AM David Li wrote:
>
> Hi all,
>
> Thanks to Antoine fo
Le 18/03/2020 à 17:36, David Li a écrit :
> Hi all,
>
> Thanks to Antoine for implementing the core read coalescing logic.
>
> We've taken a look at what else needs to be done to get this working,
> and it sounds like the following changes would be worthwhile,
> independent of the rest of the o
Hi all,
Thanks to Antoine for implementing the core read coalescing logic.
We've taken a look at what else needs to be done to get this working,
and it sounds like the following changes would be worthwhile,
independent of the rest of the optimizations we discussed:
- Add benchmarks of the curren
Catching up on questions here...
> Typically you can solve this by having enough IO concurrency at once :-)
> I'm not sure having sophisticated global coordination (based on which
> algorithms) would bring anything. Would you care to elaborate?
We aren't proposing *sophisticated* global coordina
On Thu, Feb 6, 2020 at 1:30 PM Antoine Pitrou wrote:
>
>
> Le 06/02/2020 à 20:20, Wes McKinney a écrit :
> >> Actually, on a more high-level basis, is the goal to prefetch for
> >> sequential consumption of row groups?
> >>
> >
> > Essentially yes. One "easy" optimization is to prefetch the entire
Le 06/02/2020 à 20:20, Wes McKinney a écrit :
>> Actually, on a more high-level basis, is the goal to prefetch for
>> sequential consumption of row groups?
>>
>
> Essentially yes. One "easy" optimization is to prefetch the entire
> serialized row group. This is an evolution of that idea where we
On Thu, Feb 6, 2020, 12:42 PM Antoine Pitrou wrote:
>
> Le 06/02/2020 à 19:40, Antoine Pitrou a écrit :
> >
> > Le 06/02/2020 à 19:37, Wes McKinney a écrit :
> >> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou
> wrote:
> >>
> >>> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
>
> This see
On Thu, Feb 6, 2020, 12:41 PM Antoine Pitrou wrote:
>
> Le 06/02/2020 à 19:37, Wes McKinney a écrit :
> > On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote:
> >
> >> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
> >>>
> >>> This seems useful, too. It becomes a question of where do you want to
>
Le 06/02/2020 à 19:40, Antoine Pitrou a écrit :
>
> Le 06/02/2020 à 19:37, Wes McKinney a écrit :
>> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote:
>>
>>> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
This seems useful, too. It becomes a question of where do you want to
mana
Le 06/02/2020 à 19:37, Wes McKinney a écrit :
> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote:
>
>> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
>>>
>>> This seems useful, too. It becomes a question of where do you want to
>>> manage the cached memory segments, however you obtain them. I'
On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote:
>
> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
> >
> > This seems useful, too. It becomes a question of where do you want to
> > manage the cached memory segments, however you obtain them. I'm
> > arguing that we should not have much custom c
Le 06/02/2020 à 16:26, Wes McKinney a écrit :
>
> This seems useful, too. It becomes a question of where do you want to
> manage the cached memory segments, however you obtain them. I'm
> arguing that we should not have much custom code in the Parquet
> library to manage the prefetched segments
Le 06/02/2020 à 17:07, Wes McKinney a écrit :
> In case folks are interested in how some other systems deal with IO
> management / scheduling, the comments in
>
> https://github.com/apache/impala/blob/master/be/src/runtime/io/disk-io-mgr.h
>
> and related files might be interesting
Thanks. Th
In case folks are interested in how some other systems deal with IO
management / scheduling, the comments in
https://github.com/apache/impala/blob/master/be/src/runtime/io/disk-io-mgr.h
and related files might be interesting
On Thu, Feb 6, 2020 at 9:26 AM Wes McKinney wrote:
>
> On Thu, Feb 6,
On Thu, Feb 6, 2020 at 2:46 AM Antoine Pitrou wrote:
>
> On Wed, 5 Feb 2020 15:46:15 -0600
> Wes McKinney wrote:
> >
> > I'll comment in more detail on some of the other items in due course,
> > but I think this should be handled by an implementation of
> > RandomAccessFile (that wraps a naked Ra
On Wed, 5 Feb 2020 16:37:17 -0500
David Li wrote:
>
> As a separate step, prefetching/caching should also make use of a
> global (or otherwise shared) IO thread pool, so that parallel reads of
> different files implicitly coordinate work with each other as well.
> Then, you could queue up reads o
On Wed, 5 Feb 2020 15:46:15 -0600
Wes McKinney wrote:
>
> I'll comment in more detail on some of the other items in due course,
> but I think this should be handled by an implementation of
> RandomAccessFile (that wraps a naked RandomAccessFile) with some
> additional methods, rather than adding
On Wed, Feb 5, 2020 at 3:37 PM David Li wrote:
>
> Hi Antoine and Wes,
>
> Thanks for the feedback. Yes, we should definitely consider these as
> separate features.
>
> I agree that it makes sense for the file API (or a derived API) to
> expose a generic CacheRanges or PrebufferRanges API. It coul
Hi Antoine and Wes,
Thanks for the feedback. Yes, we should definitely consider these as
separate features.
I agree that it makes sense for the file API (or a derived API) to
expose a generic CacheRanges or PrebufferRanges API. It could then do
coalescing and prefetching as desired based on the a
I agree with separating the problem into its constituent concerns to
make sure that we are developing appropriate abstractions.
Speaking specifically about the Parquet codebase, the way that we
access a particular ColumnChunk in a row group is fairly simplistic.
See the ReaderProperties::GetStream
Hi David,
I think we should discuss this as individual features.
> Read Coalescing: from Parquet metadata, we know exactly> which byte ranges of
> a file will be read, and can “cheatin the S3 IO
layer by fetching them in advance
It seems there are two things here: coalescing individual reads,
43 matches
Mail list logo