pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922625826
> Pipeline-breaking operators creating cancellation issues is a universal problem -- this is not an AggregateExec issue. @ozankabak It's indeed a universal problem, but since task cancellation is dependent on yielding I don't really see how you can implement this universally without requiring that all Future/Stream implementations are implemented in a way that plays nice. In other words, everybody needs to yield sometime. The question is then how you make it as simple and efficient as possible to make a "correct" implementation. Some centralized 'should I yield now?' mechanism would make that easier I think. > Cancellation token in TaskContext, i think it may add more overhead to performance, because we will add more if logic for the token check besides the yield return if logic. We also can add the performance POC compare in step2 follow-up. @zhuqi-lucas a cancellation token would replace the "yield every n batches" logic, not supplement it. So that's one indirect AtomicBoolean read and condition per batch. Do you think that would be more significant than the counter incrementing and condition that's in place in your YieldStream? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org