Thanks for this useful writeup and the enseuing discussion.  What you're
proposing basically looks sound to me.

Regards

Antoine.


Le 16/02/2021 à 09:29, Weston Pace a écrit :
> Thanks for the input.  I appreciate you both taking the time to look
> through this.  I'll consolidate the points here.
> 
> From Wes:
> 
>>  I hypothesize that the bottom of the stack is a thread pool with a
> queue-per-thread that implements work stealing.
> 
> Yes.  I think there may be two pools (one for I/O and one for CPU) but
> for the work I'm intending (unblocking deadlock and getting I/O off
> CPU pool) what is there today is sufficient.  Moving to work stealing
> would be a secondary priority.  It's possible a query scheduler might
> have special needs.  For example, quickstep wants a dedicated "master
> scheduler" thread which cannot be stolen from.
> 
>> Some code paths might use this low-level task API directly
> 
> Yes, I have no intention of completely shadowing the lower level
> executor, only adding optional utilities useful for building
> pipelines.
> 
>> I've brought this up in the past, but if we are comfortable with more 
>> threads than CPU cores...let me know if it doesn't make sense.
> 
> It makes sense and other projects (e.g. C#, postgres) use this kind of
> "more threads than CPU" model.  It can be simpler because it doesn't
> require you to split a long running job with blocking waits into
> smaller jobs but it won't offer more capability/functionality.  Both
> the quickstep model and the asynchronous generators model have
> mechanisms in place so that "follow-up tasks" which are ready to run
> will run immediately without going to the back of the queue.  This is
> done by running a callback immediately instead of submitting it to the
> thread pool.
> 
> The downsides of the "more threads" appraoch are:
> * It does not support the single thread case well
> * It puts scheduling in the hands of the OS.  It can actually prevent
> the benefit you described.  If you create a new thread / task pool to
> address some "run right away" tasks then there is no guarantee the new
> thread will be given a core by the OS.
> * If often leads to more communication between threads within a task
> which means more potential for race conditions (this last point is a
> little subjective but matches my personal experience)
> 
>>  If there is a mechanism for async task producers to coexist alongside with 
>> code that manually manages the execution order of tasks generated by its 
>> task graph (thinking of query engine code here a la Quickstep)
> 
> Yes.  Any of these "small tasks" models work together pretty well.
> I've taken a decent look at quickstep and I am confident it will be
> compatible.  Many of these tasks will serve as input jobs for any kind
> of query engine so it will need to be.
> 
> From Micah:
> 
>> One thing that I don't quite understand with this proposal is the scope?
> 
> My primary scope is to get blocking I/O waits off of the CPU thread
> pool (nested parallelism and potentially improved pipelining are bonus
> benefits).
> 
>> Is the intention that most APIs will eventually work with Futures instead of 
>> raw return values
> 
> If a method could potentially run some kind of long term blocking I/O
> wait then yes.  So reading / writing tables & datasets, IPC,
> filesystem APIs, etc. will all need to adapt.  It doesn't have to be
> all at once.  CPU only functions would remain as they are.  So table
> manipulation, compute functions, etc. would remain as they are.  For
> example, there would never be any advantage to creating an
> asynchronous method to drop a column from a table.
> 
> Publically things can remain as they are to achieve my primary goal.
> Once that is complete it could be a feature enhancement to allow an
> asynchronous alternative.  It's useful for those that are already
> doing asynchronous code.  In that scenario I'd propose following the
> .NET model and having duplicate APIs (e.g. Read and ReadAsync).
> 
> Internally, duplicate APIs will probably be needed as things migrate
> but once more pieces move to an asynchronous model then the
> synchronous API could go away.  It's hard to put any kind of number on
> something but if you look at the CSV reader then probably 10% or less
> of the code path (in units of "lines of code") is in methods returning
> futures.  Probably much smaller once you consider all the code behind
> the parsing / inferring / type handling / array building.  If you
> think of your code as a tree structure where the nodes are methods and
> the edges are calls the asynchronous path is fairly straight line from
> the topmost API call to some I/O handoff lower down the tree.  All of
> the code in the "branches" on either side of this path is synchronous.
> 
> On Mon, Feb 15, 2021 at 6:49 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>>
>> I took a pass through this, thank you for a good discussion of the
>> alternative.  One thing that I don't quite understand with this proposal is
>> the scope?  Is the intention that most APIs will eventually work with
>> Futures instead of raw return values (i.e. returning a Table or Record
>> batch will never be a thing, but instead you get references to
>> Future<Table>)?
>>
>> Thanks,
>> Micah
>>
>> On Mon, Feb 15, 2021 at 2:15 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>>> hi Weston,
>>>
>>> Thanks for putting this comprehensive and informative document together.
>>>
>>> There are several layers of problems to consider, just thinking out loud:
>>>
>>> * I hypothesize that the bottom of the stack is a thread pool with a
>>> queue-per-thread that implements work stealing. Some code paths might
>>> use this low-level task API directly, for example a workload putting
>>> all of its tasks into one particular queue and letting the other
>>> threads take work if they are idle.
>>>
>>> * I've brought this up in the past, but if we are comfortable with
>>> more threads than CPU cores, we may allow for the base level thread
>>> pool to be expanded dynamically. The tradeoff here is coarse
>>> granularity context switching between tasks only at time of task
>>> completion vs. the OS context-switching mid-task between threads. For
>>> example, if there is a code path which wishes to guarantee that a
>>> thread is being put to work right away to execute its tasks, even if
>>> all of the other queues are full of other tasks, then this could
>>> partially address the task prioritization problem discussed in the
>>> document. If there is a notion of a "task producer" or a "workload"
>>> and then the number of task producers exceeds the size of the thread
>>> pool, then additional an thread+dedicated task queue for that thread
>>> could be created to handle tasks submitted by the producer. Maybe this
>>> is a bad idea (I'm not an expert in this domain after all), let me
>>> know if it doesn't make sense.
>>>
>>> * I agree that we should encourage as much code as possible to use the
>>> asynchronous model — per above, if there is a mechanism for async task
>>> producers to coexist alongside with code that manually manages the
>>> execution order of tasks generated by its task graph (thinking of
>>> query engine code here a la Quickstep), then that might be good.
>>>
>>> Lots to do here but excited to see things evolve here and see the
>>> project grow faster and more scalable on systems with a lot of cores
>>> that do a lot of mixed IO/CPU work!
>>>
>>> - Wes
>>>
>>> On Tue, Feb 2, 2021 at 9:02 PM Weston Pace <weston.p...@gmail.com> wrote:
>>>>
>>>> This is a follow up to a discussion from last September [3].  I've
>>>> been investigating Arrow's use of threading and I/O and I believe
>>>> there are some improvements that could be made.  Arrow is currently
>>>> supporting two threading options (single thread and "per-core" thread
>>>> pool).  Both of these approaches are hindered if blocking I/O is
>>>> performed on a CPU worker thread.
>>>>
>>>> It is somewhat alleviated by using background threads for I/O (in the
>>>> readahead iterator) but this implementation is not complete and does
>>>> not allow for nested parallelism.  I would like to convert Arrow's I/O
>>>> operations to an asynchronous model (expanding on the existing futures
>>>> API).  I have already converted the CSV reader in this fashion [2] as
>>>> a proof of concept.
>>>>
>>>> I have written a more detailed proposal here [1].  Please feel free to
>>>> suggest improvements or alternate approaches.  Also, please let me
>>>> know if I missed any goals or considerations I should keep in mind.
>>>>
>>>> Also, hello, this email is a bit of an introduction.  I have
>>>> previously made one or two small comments/changes but I am hoping to
>>>> be more involved going forwards.  I've mostly worked on proprietary
>>>> test and measurement software but have recently joined Ursa Computing
>>>> which will allow me more time to work on Arrow.
>>>>
>>>> Thanks,
>>>>
>>>> Weston Pace
>>>>
>>>> [1]
>>> https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing
>>>> [2] https://github.com/apache/arrow/pull/9095
>>>> [3]
>>> https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E
>>>

Reply via email to