zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922500062

   > I think the issue is for streams like AggregateExec that can always make 
progress. So while this PR wraps the input to the AggregateExecStream, I think 
logically the yielding is part of the AggregateExec itself.
   
   > However, the same example can also consume the pending return generated by 
Aggregate, so the guarantee cannot be upheld in our model.
   
   > Would some kind of cancellation token in TaskContext (something really 
simple like an AtomicBoolean wrapper) make more sense? That way streams would 
just need to call some is_cancelled method instead of counting batches, 
measuring time, etc. 
   
   
   Thank you @berkaysynnada @pepijnve @alamb , 
   
   I suggest may be we can have following steps:
   1. We keep this PR, **and add timeout cancellation mock testing if 
possible**, so we can have record for what kind of streaming is added support 
of cancellation(Here is for aggregate for both no group and group) . Because we 
at least can see no performance regression for the solution, and it can easily 
add the YieldStream to other stream if we met similar issue, it's just one line 
code. For example:
   ```rust
   let input = Box::pin(YieldStream::new(input)) as SendableRecordBatchStream;
   ```
   2. I agree add YieldStream to DataSource can reduce some potential overhead 
when we meet other streaming similar issues for cancellation. But we may add a 
follow-up PR to do the POC and also compare performance to see if it will hurt 
the performance.
   3. Cancellation token in TaskContext, i think it may add more overhead to 
performance, because we will add more if logic for the token check besides the 
yield return if logic. We also can add the performance POC compare in step2 
follow-up.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to