Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

via GitHub Fri, 30 May 2025 07:00:19 -0700


alamb commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922479564


   > Even if I couldn't get consensus on that direction, this version could 
still be improved. For example, by inserting the YieldStream only when it’s 
truly needed, such as in the empty GROUP BY case. I know having GROUP BY can 
still cause such situations, but if we’re going to fix things reactively based 
on complaints, then the fixes should consistently address similar cases. 
   
   The query described in this issue has a GROUP BY (the distinct query gets 
rewritten to a `GROUP BY`)
   -  https://github.com/apache/datafusion/issues/14036
   
   > Lastly, adding also a cancellation timeout test will be good
   
   I agree with this -- figuring out how to write such a test would be really 
valuable. Perhaps we can take inspiration from 
https://github.com/pepijnve/datafusion_cancel_test form @pepijnve 
   
   1. Create an input to the AggregateExec that always is ready and feeds the 
same batch over and over again
   2. Start it running
   3. Expect that we can cancle (drop the aggregateexec stream) and that the 
underlying input is dropped too
   
   I think the best test would be SQL level: register an infinite data source 
and then run a query
   
   @zhuqi-lucas  is this something you are willing to try?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

Reply via email to