zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922500062
> I think the issue is for streams like AggregateExec that can always make progress. So while this PR wraps the input to the AggregateExecStream, I think logically the yielding is part of the AggregateExec itself. > However, the same example can also consume the pending return generated by Aggregate, so the guarantee cannot be upheld in our model. > Would some kind of cancellation token in TaskContext (something really simple like an AtomicBoolean wrapper) make more sense? That way streams would just need to call some is_cancelled method instead of counting batches, measuring time, etc. Thank you @berkaysynnada @pepijnve @alamb , I suggest may be we can have following steps: 1. We keep this PR, **and add timeout cancellation mock testing if possible**, so we can have record for what kind of streaming is added support of cancellation(Here is for aggregate for both no group and group) . Because we at least can see no performance regression for the solution, and it can easily add the YieldStream to other stream if we met similar issue, it's just one line code. For example: ```rust let input = Box::pin(YieldStream::new(input)) as SendableRecordBatchStream; ``` 2. I agree add YieldStream to DataSource can reduce some potential overhead when we meet other streaming similar issues for cancellation. But we may add a follow-up PR to do the POC and also compare performance to see if it will hurt the performance. 3. Cancellation token in TaskContext, i think it may add more overhead to performance, because we will add more if logic for the token check besides the yield return if logic. We also can add the performance POC compare in step2 follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org