Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

via GitHub Fri, 06 Jun 2025 02:14:27 -0700


pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2948608948


   > most CPU/IO cost still comes from predicate evaluation or hash‐join 
builds, etc. The variability you’re seeing in PR 16262’s benchmarks is 
therefore probably just noise rather than a real performance regression.
   
   That is my expectation as well. I wanted to evaluate the query @ozankabak 
referred to with many nested pipeline blockers. It's kind of hard to assess the 
impact with a noisy measuring device.
   
   @zhuqi-lucas I would still like to pursue the alternative approach 
(basically an extend version of what you originally proposed), but I'm reaching 
the limits of my current Rust skills and would like to solicit some input from 
others. I don't want to do that here though to keep this PR focussed on the 
optimizer rule approach. Would you be offended if I made a secondary draft PR 
that I can point people to where I clearly state it's a potential alternative 
for this one but not intended to replace it? I would like to write up the 
various design tradeoffs I've been looking at, but this comment thread is not 
the right place. Not sure where else would be appropriate besides another PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

Reply via email to