Re: Threading Improvements Proposal

Weston Pace Tue, 16 Feb 2021 00:29:32 -0800

Thanks for the input.  I appreciate you both taking the time to look
through this.  I'll consolidate the points here.

>From Wes:

>  I hypothesize that the bottom of the stack is a thread pool with a
queue-per-thread that implements work stealing.

Yes.  I think there may be two pools (one for I/O and one for CPU) but
for the work I'm intending (unblocking deadlock and getting I/O off
CPU pool) what is there today is sufficient.  Moving to work stealing
would be a secondary priority.  It's possible a query scheduler might
have special needs.  For example, quickstep wants a dedicated "master
scheduler" thread which cannot be stolen from.

> Some code paths might use this low-level task API directly

Yes, I have no intention of completely shadowing the lower level
executor, only adding optional utilities useful for building
pipelines.

> I've brought this up in the past, but if we are comfortable with more threads 
> than CPU cores...let me know if it doesn't make sense.

It makes sense and other projects (e.g. C#, postgres) use this kind of
"more threads than CPU" model.  It can be simpler because it doesn't
require you to split a long running job with blocking waits into
smaller jobs but it won't offer more capability/functionality.  Both
the quickstep model and the asynchronous generators model have
mechanisms in place so that "follow-up tasks" which are ready to run
will run immediately without going to the back of the queue.  This is
done by running a callback immediately instead of submitting it to the
thread pool.

The downsides of the "more threads" appraoch are:
* It does not support the single thread case well
* It puts scheduling in the hands of the OS.  It can actually prevent
the benefit you described.  If you create a new thread / task pool to
address some "run right away" tasks then there is no guarantee the new
thread will be given a core by the OS.
* If often leads to more communication between threads within a task
which means more potential for race conditions (this last point is a
little subjective but matches my personal experience)

>  If there is a mechanism for async task producers to coexist alongside with 
> code that manually manages the execution order of tasks generated by its task 
> graph (thinking of query engine code here a la Quickstep)

Yes.  Any of these "small tasks" models work together pretty well.
I've taken a decent look at quickstep and I am confident it will be
compatible.  Many of these tasks will serve as input jobs for any kind
of query engine so it will need to be.

>From Micah:

> One thing that I don't quite understand with this proposal is the scope?

My primary scope is to get blocking I/O waits off of the CPU thread
pool (nested parallelism and potentially improved pipelining are bonus
benefits).

> Is the intention that most APIs will eventually work with Futures instead of 
> raw return values

If a method could potentially run some kind of long term blocking I/O
wait then yes.  So reading / writing tables & datasets, IPC,
filesystem APIs, etc. will all need to adapt.  It doesn't have to be
all at once.  CPU only functions would remain as they are.  So table
manipulation, compute functions, etc. would remain as they are.  For
example, there would never be any advantage to creating an
asynchronous method to drop a column from a table.

Publically things can remain as they are to achieve my primary goal.
Once that is complete it could be a feature enhancement to allow an
asynchronous alternative.  It's useful for those that are already
doing asynchronous code.  In that scenario I'd propose following the
.NET model and having duplicate APIs (e.g. Read and ReadAsync).

Internally, duplicate APIs will probably be needed as things migrate
but once more pieces move to an asynchronous model then the
synchronous API could go away.  It's hard to put any kind of number on
something but if you look at the CSV reader then probably 10% or less
of the code path (in units of "lines of code") is in methods returning
futures.  Probably much smaller once you consider all the code behind
the parsing / inferring / type handling / array building.  If you
think of your code as a tree structure where the nodes are methods and
the edges are calls the asynchronous path is fairly straight line from
the topmost API call to some I/O handoff lower down the tree.  All of
the code in the "branches" on either side of this path is synchronous.

On Mon, Feb 15, 2021 at 6:49 PM Micah Kornfield <[email protected]> wrote:
>
> I took a pass through this, thank you for a good discussion of the
> alternative.  One thing that I don't quite understand with this proposal is
> the scope?  Is the intention that most APIs will eventually work with
> Futures instead of raw return values (i.e. returning a Table or Record
> batch will never be a thing, but instead you get references to
> Future<Table>)?
>
> Thanks,
> Micah
>
> On Mon, Feb 15, 2021 at 2:15 PM Wes McKinney <[email protected]> wrote:
>
> > hi Weston,
> >
> > Thanks for putting this comprehensive and informative document together.
> >
> > There are several layers of problems to consider, just thinking out loud:
> >
> > * I hypothesize that the bottom of the stack is a thread pool with a
> > queue-per-thread that implements work stealing. Some code paths might
> > use this low-level task API directly, for example a workload putting
> > all of its tasks into one particular queue and letting the other
> > threads take work if they are idle.
> >
> > * I've brought this up in the past, but if we are comfortable with
> > more threads than CPU cores, we may allow for the base level thread
> > pool to be expanded dynamically. The tradeoff here is coarse
> > granularity context switching between tasks only at time of task
> > completion vs. the OS context-switching mid-task between threads. For
> > example, if there is a code path which wishes to guarantee that a
> > thread is being put to work right away to execute its tasks, even if
> > all of the other queues are full of other tasks, then this could
> > partially address the task prioritization problem discussed in the
> > document. If there is a notion of a "task producer" or a "workload"
> > and then the number of task producers exceeds the size of the thread
> > pool, then additional an thread+dedicated task queue for that thread
> > could be created to handle tasks submitted by the producer. Maybe this
> > is a bad idea (I'm not an expert in this domain after all), let me
> > know if it doesn't make sense.
> >
> > * I agree that we should encourage as much code as possible to use the
> > asynchronous model — per above, if there is a mechanism for async task
> > producers to coexist alongside with code that manually manages the
> > execution order of tasks generated by its task graph (thinking of
> > query engine code here a la Quickstep), then that might be good.
> >
> > Lots to do here but excited to see things evolve here and see the
> > project grow faster and more scalable on systems with a lot of cores
> > that do a lot of mixed IO/CPU work!
> >
> > - Wes
> >
> > On Tue, Feb 2, 2021 at 9:02 PM Weston Pace <[email protected]> wrote:
> > >
> > > This is a follow up to a discussion from last September [3].  I've
> > > been investigating Arrow's use of threading and I/O and I believe
> > > there are some improvements that could be made.  Arrow is currently
> > > supporting two threading options (single thread and "per-core" thread
> > > pool).  Both of these approaches are hindered if blocking I/O is
> > > performed on a CPU worker thread.
> > >
> > > It is somewhat alleviated by using background threads for I/O (in the
> > > readahead iterator) but this implementation is not complete and does
> > > not allow for nested parallelism.  I would like to convert Arrow's I/O
> > > operations to an asynchronous model (expanding on the existing futures
> > > API).  I have already converted the CSV reader in this fashion [2] as
> > > a proof of concept.
> > >
> > > I have written a more detailed proposal here [1].  Please feel free to
> > > suggest improvements or alternate approaches.  Also, please let me
> > > know if I missed any goals or considerations I should keep in mind.
> > >
> > > Also, hello, this email is a bit of an introduction.  I have
> > > previously made one or two small comments/changes but I am hoping to
> > > be more involved going forwards.  I've mostly worked on proprietary
> > > test and measurement software but have recently joined Ursa Computing
> > > which will allow me more time to work on Arrow.
> > >
> > > Thanks,
> > >
> > > Weston Pace
> > >
> > > [1]
> > https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing
> > > [2] https://github.com/apache/arrow/pull/9095
> > > [3]
> > https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E
> >

Re: Threading Improvements Proposal

Reply via email to