zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942862579
> > I am onboard with the approach in this PR, and it looks good to me overall. Just needs some finishing touches: > > > > * To avoid such a large diff (which is mostly plans in tests), I suggest gating this behind a configuration flag, which would default to `false` for now. We will turn it on in subsequent PRs. We just need to add a few tests with this flag turned on to get it exercised in CI. > > * In the next PR (or this PR if you want), before we turn the flag on, we can add an API to ask whether a leaf supports built-in yielding. If it does, the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add built-in yielding support to a few common leaf nodes, and turn the configuration flag on after doing so, plan changes will be minimal. > > * In another subsequent PR, we can fix `InterleaveExec` (@berkaysynnada and I started discussing how one can approach this problem). > > * I suggest making the number of batches to yield not a constant but a parameter of `YieldStreamExec`. Also, when we display it, showing its child is not informative (we see that from the print-out already). Displaying yielding frequency is probably a better idea. > > > > Thanks @zhuqi-lucas for the great work. Love it. > > Thank you @ozankabak , i have done in the latest PR: > > 1. Gating this behind a configuration flag, which would default to `false` for now. > 2. Only enable it for a small testing case and the end to end testing. > 3. Making the number of batches to yield not a constant but a parameter of `YieldStreamExec`, also show the frequency for the display. > > Not done until now: > > 1. Add an API to ask whether a leaf supports built-in yielding. If it does, the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add built-in yielding support to a few common leaf nodes, and turn the configuration flag on after doing so, plan changes will be minimal. > 2. Fix `InterleaveExec` related corner cases. > > What do you mean built-in yielding? If it means we add some leaf nodes to support built-in yielding support? Thanks! > > If it means, we manually wrap some YieldStream to those execs? And after that, we optimize it again, so we need to exclude those operators? Updated : I have done all the remaining subtasks: 1. Add an API to ask whether a leaf supports built-in yielding. If it does, the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add built-in yielding support to a few common leaf nodes, and turn the configuration flag on after doing so, plan changes will be minimal. 2. And setting DataSourceExec/LazyMemoryExec as a built-in YieldStream because they are the most common cases. 3. Enabled the flag to true now, only small number slt shows we need to add YieldStreamExec, it means DataSourceExec/LazyMemoryExec built-in will cover almost all cases. Remaining things: 1. Add InterleaveExec, etc corner cases reproduce testing. 2. Make all the corner cases passing the testing? It also can be done by changing the optimizer rule to match. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org