zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922532031
> > Even if I couldn't get consensus on that direction, this version could still be improved. For example, by inserting the YieldStream only when it’s truly needed, such as in the empty GROUP BY case. I know having GROUP BY can still cause such situations, but if we’re going to fix things reactively based on complaints, then the fixes should consistently address similar cases. > > The query described in this issue has a GROUP BY (the distinct query gets rewritten to a `GROUP BY`) > > * [Un-cancellable Query when hitting many large files. #14036](https://github.com/apache/datafusion/issues/14036) > > > Lastly, adding also a cancellation timeout test will be good > > I agree with this -- figuring out how to write such a test would be really valuable. Perhaps we can take inspiration from https://github.com/pepijnve/datafusion_cancel_test form @pepijnve > > 1. Create an input to the AggregateExec that always is ready and feeds the same batch over and over again > 2. Start it running > 3. Expect that we can cancle (drop the aggregateexec stream) and that the underlying input is dropped too > > I think the best test would be SQL level: register an infinite data source and then run a query > > @zhuqi-lucas is this something you are willing to try? Sure @alamb , i will try this to add mock testing the timeout for cancellation. Thanks. > Pipeline-breaking operators creating cancellation issues is a universal problem -- this is not an AggregateExec issue. Therefore, I firmly believe that the solution has to be universal as well -- I don't agree with adding a YieldStream type construct to all such operators. Thank you @ozankabak , if so, let me also try the solution for low level unified solution which add to datasource. Following steps maybe: 1. When we have the mock testing, we can make sure the unified solution will work also. 2. Run performance testing for the unified solution, and make sure it no regression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org