ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939817555
@alamb, I don't see how your two points materialize, but maybe I'm missing something. Let's analyze each one by one. >1. Only ExecutionPlans that don't have some other yield point are penalized with the yield. With YieldExec all plans get changed, even those that don't need it now will yield Why do you think this is the case? The rule only inserts `YieldExec`s if the plan involves pipeline-breaking, and only once at leaves. I think the penalty/overhead will alway be strictly less in this approach. >2. The explain plan is now more complicated (and various optimizers that use as_any to check for types may need to be changed). Most people looking at a plan should not have to care about yielding is done This is true if we merge the PR directly as is and do not complete the work, but not if we add an API to ask whether a leaf supports built-in yielding. If it does, the rule wouldn't add YieldStreamExec at all to that leaf. If we add built-in yielding support to a few common leaf nodes, plan changes will be minimal while retaining the efficiency above. Here is the relevant message I sent before detailing this: > To avoid such a large diff (which is mostly plans in tests), I suggest gating this behind a configuration flag, which would default to false for now. We will turn it on in subsequent PRs. We just need to add a few tests with this flag turned on to get it exercised in CI. > In the next PR (or this PR if you want), before we turn the flag on, we can add an API to ask whether a leaf supports built-in yielding. If it does, the rule wouldn't add YieldStreamExec at all to that leaf. If we add built-in yielding support to a few common leaf nodes, and turn the configuration flag on after doing so, plan changes will be minimal. We may end up disagreeing for a while until we find a good solution (and that's fine, this is how engineering works). However, we should at least agree on why we are disagreeing, otherwise it will be hard to ultimately find a good solution 🙂 I'm optimistic that we will. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org