alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922479564
> Even if I couldn't get consensus on that direction, this version could still be improved. For example, by inserting the YieldStream only when it’s truly needed, such as in the empty GROUP BY case. I know having GROUP BY can still cause such situations, but if we’re going to fix things reactively based on complaints, then the fixes should consistently address similar cases. The query described in this issue has a GROUP BY (the distinct query gets rewritten to a `GROUP BY`) - https://github.com/apache/datafusion/issues/14036 > Lastly, adding also a cancellation timeout test will be good I agree with this -- figuring out how to write such a test would be really valuable. Perhaps we can take inspiration from https://github.com/pepijnve/datafusion_cancel_test form @pepijnve 1. Create an input to the AggregateExec that always is ready and feeds the same batch over and over again 2. Start it running 3. Expect that we can cancle (drop the aggregateexec stream) and that the underlying input is dropped too I think the best test would be SQL level: register an infinite data source and then run a query @zhuqi-lucas is this something you are willing to try? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org