Thanks for the input. I appreciate you both taking the time to look through this. I'll consolidate the points here.
>From Wes: > I hypothesize that the bottom of the stack is a thread pool with a queue-per-thread that implements work stealing. Yes. I think there may be two pools (one for I/O and one for CPU) but for the work I'm intending (unblocking deadlock and getting I/O off CPU pool) what is there today is sufficient. Moving to work stealing would be a secondary priority. It's possible a query scheduler might have special needs. For example, quickstep wants a dedicated "master scheduler" thread which cannot be stolen from. > Some code paths might use this low-level task API directly Yes, I have no intention of completely shadowing the lower level executor, only adding optional utilities useful for building pipelines. > I've brought this up in the past, but if we are comfortable with more threads > than CPU cores...let me know if it doesn't make sense. It makes sense and other projects (e.g. C#, postgres) use this kind of "more threads than CPU" model. It can be simpler because it doesn't require you to split a long running job with blocking waits into smaller jobs but it won't offer more capability/functionality. Both the quickstep model and the asynchronous generators model have mechanisms in place so that "follow-up tasks" which are ready to run will run immediately without going to the back of the queue. This is done by running a callback immediately instead of submitting it to the thread pool. The downsides of the "more threads" appraoch are: * It does not support the single thread case well * It puts scheduling in the hands of the OS. It can actually prevent the benefit you described. If you create a new thread / task pool to address some "run right away" tasks then there is no guarantee the new thread will be given a core by the OS. * If often leads to more communication between threads within a task which means more potential for race conditions (this last point is a little subjective but matches my personal experience) > If there is a mechanism for async task producers to coexist alongside with > code that manually manages the execution order of tasks generated by its task > graph (thinking of query engine code here a la Quickstep) Yes. Any of these "small tasks" models work together pretty well. I've taken a decent look at quickstep and I am confident it will be compatible. Many of these tasks will serve as input jobs for any kind of query engine so it will need to be. >From Micah: > One thing that I don't quite understand with this proposal is the scope? My primary scope is to get blocking I/O waits off of the CPU thread pool (nested parallelism and potentially improved pipelining are bonus benefits). > Is the intention that most APIs will eventually work with Futures instead of > raw return values If a method could potentially run some kind of long term blocking I/O wait then yes. So reading / writing tables & datasets, IPC, filesystem APIs, etc. will all need to adapt. It doesn't have to be all at once. CPU only functions would remain as they are. So table manipulation, compute functions, etc. would remain as they are. For example, there would never be any advantage to creating an asynchronous method to drop a column from a table. Publically things can remain as they are to achieve my primary goal. Once that is complete it could be a feature enhancement to allow an asynchronous alternative. It's useful for those that are already doing asynchronous code. In that scenario I'd propose following the .NET model and having duplicate APIs (e.g. Read and ReadAsync). Internally, duplicate APIs will probably be needed as things migrate but once more pieces move to an asynchronous model then the synchronous API could go away. It's hard to put any kind of number on something but if you look at the CSV reader then probably 10% or less of the code path (in units of "lines of code") is in methods returning futures. Probably much smaller once you consider all the code behind the parsing / inferring / type handling / array building. If you think of your code as a tree structure where the nodes are methods and the edges are calls the asynchronous path is fairly straight line from the topmost API call to some I/O handoff lower down the tree. All of the code in the "branches" on either side of this path is synchronous. On Mon, Feb 15, 2021 at 6:49 PM Micah Kornfield <[email protected]> wrote: > > I took a pass through this, thank you for a good discussion of the > alternative. One thing that I don't quite understand with this proposal is > the scope? Is the intention that most APIs will eventually work with > Futures instead of raw return values (i.e. returning a Table or Record > batch will never be a thing, but instead you get references to > Future<Table>)? > > Thanks, > Micah > > On Mon, Feb 15, 2021 at 2:15 PM Wes McKinney <[email protected]> wrote: > > > hi Weston, > > > > Thanks for putting this comprehensive and informative document together. > > > > There are several layers of problems to consider, just thinking out loud: > > > > * I hypothesize that the bottom of the stack is a thread pool with a > > queue-per-thread that implements work stealing. Some code paths might > > use this low-level task API directly, for example a workload putting > > all of its tasks into one particular queue and letting the other > > threads take work if they are idle. > > > > * I've brought this up in the past, but if we are comfortable with > > more threads than CPU cores, we may allow for the base level thread > > pool to be expanded dynamically. The tradeoff here is coarse > > granularity context switching between tasks only at time of task > > completion vs. the OS context-switching mid-task between threads. For > > example, if there is a code path which wishes to guarantee that a > > thread is being put to work right away to execute its tasks, even if > > all of the other queues are full of other tasks, then this could > > partially address the task prioritization problem discussed in the > > document. If there is a notion of a "task producer" or a "workload" > > and then the number of task producers exceeds the size of the thread > > pool, then additional an thread+dedicated task queue for that thread > > could be created to handle tasks submitted by the producer. Maybe this > > is a bad idea (I'm not an expert in this domain after all), let me > > know if it doesn't make sense. > > > > * I agree that we should encourage as much code as possible to use the > > asynchronous model — per above, if there is a mechanism for async task > > producers to coexist alongside with code that manually manages the > > execution order of tasks generated by its task graph (thinking of > > query engine code here a la Quickstep), then that might be good. > > > > Lots to do here but excited to see things evolve here and see the > > project grow faster and more scalable on systems with a lot of cores > > that do a lot of mixed IO/CPU work! > > > > - Wes > > > > On Tue, Feb 2, 2021 at 9:02 PM Weston Pace <[email protected]> wrote: > > > > > > This is a follow up to a discussion from last September [3]. I've > > > been investigating Arrow's use of threading and I/O and I believe > > > there are some improvements that could be made. Arrow is currently > > > supporting two threading options (single thread and "per-core" thread > > > pool). Both of these approaches are hindered if blocking I/O is > > > performed on a CPU worker thread. > > > > > > It is somewhat alleviated by using background threads for I/O (in the > > > readahead iterator) but this implementation is not complete and does > > > not allow for nested parallelism. I would like to convert Arrow's I/O > > > operations to an asynchronous model (expanding on the existing futures > > > API). I have already converted the CSV reader in this fashion [2] as > > > a proof of concept. > > > > > > I have written a more detailed proposal here [1]. Please feel free to > > > suggest improvements or alternate approaches. Also, please let me > > > know if I missed any goals or considerations I should keep in mind. > > > > > > Also, hello, this email is a bit of an introduction. I have > > > previously made one or two small comments/changes but I am hoping to > > > be more involved going forwards. I've mostly worked on proprietary > > > test and measurement software but have recently joined Ursa Computing > > > which will allow me more time to work on Arrow. > > > > > > Thanks, > > > > > > Weston Pace > > > > > > [1] > > https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing > > > [2] https://github.com/apache/arrow/pull/9095 > > > [3] > > https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E > >
