Thanks for this useful writeup and the enseuing discussion. What you're proposing basically looks sound to me.
Regards Antoine. Le 16/02/2021 à 09:29, Weston Pace a écrit : > Thanks for the input. I appreciate you both taking the time to look > through this. I'll consolidate the points here. > > From Wes: > >> I hypothesize that the bottom of the stack is a thread pool with a > queue-per-thread that implements work stealing. > > Yes. I think there may be two pools (one for I/O and one for CPU) but > for the work I'm intending (unblocking deadlock and getting I/O off > CPU pool) what is there today is sufficient. Moving to work stealing > would be a secondary priority. It's possible a query scheduler might > have special needs. For example, quickstep wants a dedicated "master > scheduler" thread which cannot be stolen from. > >> Some code paths might use this low-level task API directly > > Yes, I have no intention of completely shadowing the lower level > executor, only adding optional utilities useful for building > pipelines. > >> I've brought this up in the past, but if we are comfortable with more >> threads than CPU cores...let me know if it doesn't make sense. > > It makes sense and other projects (e.g. C#, postgres) use this kind of > "more threads than CPU" model. It can be simpler because it doesn't > require you to split a long running job with blocking waits into > smaller jobs but it won't offer more capability/functionality. Both > the quickstep model and the asynchronous generators model have > mechanisms in place so that "follow-up tasks" which are ready to run > will run immediately without going to the back of the queue. This is > done by running a callback immediately instead of submitting it to the > thread pool. > > The downsides of the "more threads" appraoch are: > * It does not support the single thread case well > * It puts scheduling in the hands of the OS. It can actually prevent > the benefit you described. If you create a new thread / task pool to > address some "run right away" tasks then there is no guarantee the new > thread will be given a core by the OS. > * If often leads to more communication between threads within a task > which means more potential for race conditions (this last point is a > little subjective but matches my personal experience) > >> If there is a mechanism for async task producers to coexist alongside with >> code that manually manages the execution order of tasks generated by its >> task graph (thinking of query engine code here a la Quickstep) > > Yes. Any of these "small tasks" models work together pretty well. > I've taken a decent look at quickstep and I am confident it will be > compatible. Many of these tasks will serve as input jobs for any kind > of query engine so it will need to be. > > From Micah: > >> One thing that I don't quite understand with this proposal is the scope? > > My primary scope is to get blocking I/O waits off of the CPU thread > pool (nested parallelism and potentially improved pipelining are bonus > benefits). > >> Is the intention that most APIs will eventually work with Futures instead of >> raw return values > > If a method could potentially run some kind of long term blocking I/O > wait then yes. So reading / writing tables & datasets, IPC, > filesystem APIs, etc. will all need to adapt. It doesn't have to be > all at once. CPU only functions would remain as they are. So table > manipulation, compute functions, etc. would remain as they are. For > example, there would never be any advantage to creating an > asynchronous method to drop a column from a table. > > Publically things can remain as they are to achieve my primary goal. > Once that is complete it could be a feature enhancement to allow an > asynchronous alternative. It's useful for those that are already > doing asynchronous code. In that scenario I'd propose following the > .NET model and having duplicate APIs (e.g. Read and ReadAsync). > > Internally, duplicate APIs will probably be needed as things migrate > but once more pieces move to an asynchronous model then the > synchronous API could go away. It's hard to put any kind of number on > something but if you look at the CSV reader then probably 10% or less > of the code path (in units of "lines of code") is in methods returning > futures. Probably much smaller once you consider all the code behind > the parsing / inferring / type handling / array building. If you > think of your code as a tree structure where the nodes are methods and > the edges are calls the asynchronous path is fairly straight line from > the topmost API call to some I/O handoff lower down the tree. All of > the code in the "branches" on either side of this path is synchronous. > > On Mon, Feb 15, 2021 at 6:49 PM Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> I took a pass through this, thank you for a good discussion of the >> alternative. One thing that I don't quite understand with this proposal is >> the scope? Is the intention that most APIs will eventually work with >> Futures instead of raw return values (i.e. returning a Table or Record >> batch will never be a thing, but instead you get references to >> Future<Table>)? >> >> Thanks, >> Micah >> >> On Mon, Feb 15, 2021 at 2:15 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >>> hi Weston, >>> >>> Thanks for putting this comprehensive and informative document together. >>> >>> There are several layers of problems to consider, just thinking out loud: >>> >>> * I hypothesize that the bottom of the stack is a thread pool with a >>> queue-per-thread that implements work stealing. Some code paths might >>> use this low-level task API directly, for example a workload putting >>> all of its tasks into one particular queue and letting the other >>> threads take work if they are idle. >>> >>> * I've brought this up in the past, but if we are comfortable with >>> more threads than CPU cores, we may allow for the base level thread >>> pool to be expanded dynamically. The tradeoff here is coarse >>> granularity context switching between tasks only at time of task >>> completion vs. the OS context-switching mid-task between threads. For >>> example, if there is a code path which wishes to guarantee that a >>> thread is being put to work right away to execute its tasks, even if >>> all of the other queues are full of other tasks, then this could >>> partially address the task prioritization problem discussed in the >>> document. If there is a notion of a "task producer" or a "workload" >>> and then the number of task producers exceeds the size of the thread >>> pool, then additional an thread+dedicated task queue for that thread >>> could be created to handle tasks submitted by the producer. Maybe this >>> is a bad idea (I'm not an expert in this domain after all), let me >>> know if it doesn't make sense. >>> >>> * I agree that we should encourage as much code as possible to use the >>> asynchronous model — per above, if there is a mechanism for async task >>> producers to coexist alongside with code that manually manages the >>> execution order of tasks generated by its task graph (thinking of >>> query engine code here a la Quickstep), then that might be good. >>> >>> Lots to do here but excited to see things evolve here and see the >>> project grow faster and more scalable on systems with a lot of cores >>> that do a lot of mixed IO/CPU work! >>> >>> - Wes >>> >>> On Tue, Feb 2, 2021 at 9:02 PM Weston Pace <weston.p...@gmail.com> wrote: >>>> >>>> This is a follow up to a discussion from last September [3]. I've >>>> been investigating Arrow's use of threading and I/O and I believe >>>> there are some improvements that could be made. Arrow is currently >>>> supporting two threading options (single thread and "per-core" thread >>>> pool). Both of these approaches are hindered if blocking I/O is >>>> performed on a CPU worker thread. >>>> >>>> It is somewhat alleviated by using background threads for I/O (in the >>>> readahead iterator) but this implementation is not complete and does >>>> not allow for nested parallelism. I would like to convert Arrow's I/O >>>> operations to an asynchronous model (expanding on the existing futures >>>> API). I have already converted the CSV reader in this fashion [2] as >>>> a proof of concept. >>>> >>>> I have written a more detailed proposal here [1]. Please feel free to >>>> suggest improvements or alternate approaches. Also, please let me >>>> know if I missed any goals or considerations I should keep in mind. >>>> >>>> Also, hello, this email is a bit of an introduction. I have >>>> previously made one or two small comments/changes but I am hoping to >>>> be more involved going forwards. I've mostly worked on proprietary >>>> test and measurement software but have recently joined Ursa Computing >>>> which will allow me more time to work on Arrow. >>>> >>>> Thanks, >>>> >>>> Weston Pace >>>> >>>> [1] >>> https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing >>>> [2] https://github.com/apache/arrow/pull/9095 >>>> [3] >>> https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E >>>