zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2930489390

   > > Making the rule smarter, by utilizing EmissionType information, so that 
it only adds YieldExecs when necessary. If the path from a leaf to the root 
does not involve any operator that is pipeline-breaking, there is no need to 
insert a YieldExec as the parent of that leaf. I think this similar to 
@pepijnve' thinking.
   > 
   > My thinking was that we could use EmissionType to insert the yield wrapper 
closer to where it's needed rather than at the leaves.
   > 
   > So if you have
   > 
   > ```
   > AggregateExec -> final
   >   FilterExec -> copy input behavior
   >     ProjectionExec -> copy input behavior
   >       DataSourceExec -> incremental
   > ```
   > 
   > rather than adding yield as parent of the leaves
   > 
   > ```
   > AggregateExec -> final
   >   FilterExec -> copy input behavior
   >     ProjectionExec -> copy input behavior
   >       YieldExec -> copy input behavior
   >         DataSourceExec -> incremental
   > ```
   > 
   > you would add it as parent of the children of the node with emission type 
final
   > 
   > ```
   > AggregateExec -> final
   >   YieldExec -> copy input behavior
   >     FilterExec -> copy input behavior
   >       ProjectionExec -> copy input behavior
   >         DataSourceExec -> incremental
   > ```
   > 
   > I'll admit that I didn't work out all the possible scenarios, but my 
reasoning is that if you have more than one pipeline breaking operator in a 
chain that this will work more reliably since you're 'fixing' each of them 
rather than injecting yield points at the leaves and hoping this propagates 
through the entire chain. Additional benefit might be that it's more trivial to 
implement the plan transformation this way since it only requires very local 
context (i.e. the 'final' nodes themselves).
   
   It makes sense and a smart solution! I will try to address.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to