Re: [PR] rfc: optional skipping partial aggregation [datafusion]

via GitHub Mon, 29 Jul 2024 06:44:18 -0700


alamb commented on PR #11627:
URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2255987416


   
   > > 1000 partitions
   > 
   > @alamb this is also a bit unexpected, since default value of rows to fire 
check after is 100_000 and its applied per partition (each partition is going 
to process at least 100k rows normally, without skipping aggregation), and the 
total number of rows in the file ~100kk (if I'm not mistaken). So this 
optimization should not benefit in this case, as in case of 1000 partitions 
each partition will read ~100_000 rows anyway 🤔
   
   
   You are correct 🤔  I tested using the metric I added in 
https://github.com/apache/datafusion/pull/11706 and indeed this codepath isn't 
executed 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] rfc: optional skipping partial aggregation [datafusion]

Reply via email to