zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942862579

   > > I am onboard with the approach in this PR, and it looks good to me 
overall. Just needs some finishing touches:
   > > 
   > > * To avoid such a large diff (which is mostly plans in tests), I suggest 
gating this behind a configuration flag, which would default to `false` for 
now. We will turn it on in subsequent PRs. We just need to add a few tests with 
this flag turned on to get it exercised in CI.
   > > * In the next PR (or this PR if you want), before we turn the flag on, 
we can add an API to ask whether a leaf supports built-in yielding. If it does, 
the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add built-in 
yielding support to a few common leaf nodes, and turn the configuration flag on 
after doing so, plan changes will be minimal.
   > > * In another subsequent PR, we can fix `InterleaveExec` (@berkaysynnada 
and I started discussing how one can approach this problem).
   > > * I suggest making the number of batches to yield not a constant but a 
parameter of `YieldStreamExec`. Also, when we display it, showing its child is 
not informative (we see that from the print-out already). Displaying yielding 
frequency is probably a better idea.
   > > 
   > > Thanks @zhuqi-lucas for the great work. Love it.
   > 
   > Thank you @ozankabak , i have done in the latest PR:
   > 
   > 1. Gating this behind a configuration flag, which would default to `false` 
for now.
   > 2. Only enable it for a small testing case and the end to end testing.
   > 3. Making the number of batches to yield not a constant but a parameter of 
`YieldStreamExec`, also show the frequency for the display.
   > 
   > Not done until now:
   > 
   > 1. Add an API to ask whether a leaf supports built-in yielding. If it 
does, the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add 
built-in yielding support to a few common leaf nodes, and turn the 
configuration flag on after doing so, plan changes will be minimal.
   > 2. Fix `InterleaveExec` related corner cases.
   > 
   > What do you mean built-in yielding? If it means we add some leaf nodes to 
support built-in yielding support? Thanks!
   > 
   > If it means, we manually wrap some YieldStream to those execs? And after 
that, we optimize it again, so we need to exclude those operators?
   
   Updated :
   
   I have done all the remaining subtasks:
   1. Add an API to ask whether a leaf supports built-in yielding. If it does, 
the rule wouldn't add `YieldStreamExec` at all to that leaf. If we add built-in 
yielding support to a few common leaf nodes, and turn the configuration flag on 
after doing so, plan changes will be minimal.
   2. And setting DataSourceExec/LazyMemoryExec as a built-in YieldStream 
because they are the most common cases.
   3. Enabled the flag to true now, only small number slt shows we need to add 
YieldStreamExec, it means DataSourceExec/LazyMemoryExec built-in will cover 
almost all cases.
   
   Remaining things:
   1. Add InterleaveExec, etc corner cases reproduce testing.
   2. Make all the corner cases passing the testing? It also can be done by 
changing the optimizer rule to match.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to