Weston Pace created ARROW-16522:
-----------------------------------

             Summary: [C++] Evolution of exec plan
                 Key: ARROW-16522
                 URL: https://issues.apache.org/jira/browse/ARROW-16522
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


This is a high level JIRA that will be broken into a number of subtasks.  There 
are a number of things that each node has to reinvent (or, more likely, 
copy/paste) and it we probably know enough at this point that we can enhance 
the ExecNode/ExecPlan API a bit to move some of the burden out of the nodes and 
into the plan.  At a high level:

 * All nodes that schedule tasks use an async task group so they can make sure 
that all scheduled tasks are finished before the node is marked finish.  Task 
scheduling could happen at the plan level and there would be one spot keeping 
track of running tasks.
 * If we have centralized scheduling then that opens up the door for a 
centralized handling of errors and backpressure.  A node's default error 
behavior is then "stop scheduling tasks" and a node's default backpressure 
behavior is "stop running tasks".
 * There are no nodes that emit to multiple outputs today.  Sending data to 
multiple outputs is not simple.  This is a pipeline breaking activity and new 
tasks will need to be scheduled and the data will need to be held onto until 
each downstream task has run, and backpressure may need to be applied if one 
output is slower than the other.  Node's should not be given the choice of 
multiple outputs unless they are solving specifically that problem (e.g. 
temporary table).




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to