pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922625826

   > Pipeline-breaking operators creating cancellation issues is a universal 
problem -- this is not an AggregateExec issue.
   
   @ozankabak It's indeed a universal problem, but since task cancellation is 
dependent on yielding I don't really see how you can implement this universally 
without requiring that all Future/Stream implementations are implemented in a 
way that plays nice. In other words, everybody needs to yield sometime. The 
question is then how you make it as simple and efficient as possible to make a 
"correct" implementation. Some centralized 'should I yield now?' mechanism 
would make that easier I think.
   
   > Cancellation token in TaskContext, i think it may add more overhead to 
performance, because we will add more if logic for the token check besides the 
yield return if logic. We also can add the performance POC compare in step2 
follow-up.
   
   @zhuqi-lucas a cancellation token would replace the "yield every n batches" 
logic, not supplement it. So that's one indirect AtomicBoolean read and 
condition per batch. Do you think that would be more significant than the 
counter incrementing and condition that's in place in your YieldStream?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to