hi Weston, Thanks for putting this comprehensive and informative document together.
There are several layers of problems to consider, just thinking out loud: * I hypothesize that the bottom of the stack is a thread pool with a queue-per-thread that implements work stealing. Some code paths might use this low-level task API directly, for example a workload putting all of its tasks into one particular queue and letting the other threads take work if they are idle. * I've brought this up in the past, but if we are comfortable with more threads than CPU cores, we may allow for the base level thread pool to be expanded dynamically. The tradeoff here is coarse granularity context switching between tasks only at time of task completion vs. the OS context-switching mid-task between threads. For example, if there is a code path which wishes to guarantee that a thread is being put to work right away to execute its tasks, even if all of the other queues are full of other tasks, then this could partially address the task prioritization problem discussed in the document. If there is a notion of a "task producer" or a "workload" and then the number of task producers exceeds the size of the thread pool, then additional an thread+dedicated task queue for that thread could be created to handle tasks submitted by the producer. Maybe this is a bad idea (I'm not an expert in this domain after all), let me know if it doesn't make sense. * I agree that we should encourage as much code as possible to use the asynchronous model — per above, if there is a mechanism for async task producers to coexist alongside with code that manually manages the execution order of tasks generated by its task graph (thinking of query engine code here a la Quickstep), then that might be good. Lots to do here but excited to see things evolve here and see the project grow faster and more scalable on systems with a lot of cores that do a lot of mixed IO/CPU work! - Wes On Tue, Feb 2, 2021 at 9:02 PM Weston Pace <weston.p...@gmail.com> wrote: > > This is a follow up to a discussion from last September [3]. I've > been investigating Arrow's use of threading and I/O and I believe > there are some improvements that could be made. Arrow is currently > supporting two threading options (single thread and "per-core" thread > pool). Both of these approaches are hindered if blocking I/O is > performed on a CPU worker thread. > > It is somewhat alleviated by using background threads for I/O (in the > readahead iterator) but this implementation is not complete and does > not allow for nested parallelism. I would like to convert Arrow's I/O > operations to an asynchronous model (expanding on the existing futures > API). I have already converted the CSV reader in this fashion [2] as > a proof of concept. > > I have written a more detailed proposal here [1]. Please feel free to > suggest improvements or alternate approaches. Also, please let me > know if I missed any goals or considerations I should keep in mind. > > Also, hello, this email is a bit of an introduction. I have > previously made one or two small comments/changes but I am hoping to > be more involved going forwards. I've mostly worked on proprietary > test and measurement software but have recently joined Ursa Computing > which will allow me more time to work on Arrow. > > Thanks, > > Weston Pace > > [1] > https://docs.google.com/document/d/1tO2WwYL-G2cB_MCPqYguKjKkRT7mZ8C2Gc9ONvspfgo/edit?usp=sharing > [2] https://github.com/apache/arrow/pull/9095 > [3] > https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3CCAJPUwMDmU3rFt6Upyis%3DyXB%3DECkmrjdncgR9xj%3DDFapJt9FfUg%40mail.gmail.com%3E