zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2922532031

   > > Even if I couldn't get consensus on that direction, this version could 
still be improved. For example, by inserting the YieldStream only when it’s 
truly needed, such as in the empty GROUP BY case. I know having GROUP BY can 
still cause such situations, but if we’re going to fix things reactively based 
on complaints, then the fixes should consistently address similar cases.
   > 
   > The query described in this issue has a GROUP BY (the distinct query gets 
rewritten to a `GROUP BY`)
   > 
   > * [Un-cancellable Query when hitting many large files. 
#14036](https://github.com/apache/datafusion/issues/14036)
   > 
   > > Lastly, adding also a cancellation timeout test will be good
   > 
   > I agree with this -- figuring out how to write such a test would be really 
valuable. Perhaps we can take inspiration from 
https://github.com/pepijnve/datafusion_cancel_test form @pepijnve
   > 
   > 1. Create an input to the AggregateExec that always is ready and feeds the 
same batch over and over again
   > 2. Start it running
   > 3. Expect that we can cancle (drop the aggregateexec stream) and that the 
underlying input is dropped too
   > 
   > I think the best test would be SQL level: register an infinite data source 
and then run a query
   > 
   > @zhuqi-lucas is this something you are willing to try?
   
   Sure @alamb , i will try this to add mock testing the timeout for 
cancellation. Thanks.
   
   
   > Pipeline-breaking operators creating cancellation issues is a universal 
problem -- this is not an AggregateExec issue. Therefore, I firmly believe that 
the solution has to be universal as well -- I don't agree with adding a 
YieldStream type construct to all such operators.
   
   Thank you @ozankabak , if so, let me also try the solution for low level 
unified solution which add to datasource.
   
   Following steps maybe:
   1. When we have the mock testing, we can make sure the unified solution will 
work also.
   2. Run performance testing for the unified solution, and make sure it no 
regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to