zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2930489390
> > Making the rule smarter, by utilizing EmissionType information, so that it only adds YieldExecs when necessary. If the path from a leaf to the root does not involve any operator that is pipeline-breaking, there is no need to insert a YieldExec as the parent of that leaf. I think this similar to @pepijnve' thinking. > > My thinking was that we could use EmissionType to insert the yield wrapper closer to where it's needed rather than at the leaves. > > So if you have > > ``` > AggregateExec -> final > FilterExec -> copy input behavior > ProjectionExec -> copy input behavior > DataSourceExec -> incremental > ``` > > rather than adding yield as parent of the leaves > > ``` > AggregateExec -> final > FilterExec -> copy input behavior > ProjectionExec -> copy input behavior > YieldExec -> copy input behavior > DataSourceExec -> incremental > ``` > > you would add it as parent of the children of the node with emission type final > > ``` > AggregateExec -> final > YieldExec -> copy input behavior > FilterExec -> copy input behavior > ProjectionExec -> copy input behavior > DataSourceExec -> incremental > ``` > > I'll admit that I didn't work out all the possible scenarios, but my reasoning is that if you have more than one pipeline breaking operator in a chain that this will work more reliably since you're 'fixing' each of them rather than injecting yield points at the leaves and hoping this propagates through the entire chain. Additional benefit might be that it's more trivial to implement the plan transformation this way since it only requires very local context (i.e. the 'final' nodes themselves). It makes sense and a smart solution! I will try to address. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org