alamb commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2255987416
> > 1000 partitions > > @alamb this is also a bit unexpected, since default value of rows to fire check after is 100_000 and its applied per partition (each partition is going to process at least 100k rows normally, without skipping aggregation), and the total number of rows in the file ~100kk (if I'm not mistaken). So this optimization should not benefit in this case, as in case of 1000 partitions each partition will read ~100_000 rows anyway 🤔 You are correct 🤔 I tested using the metric I added in https://github.com/apache/datafusion/pull/11706 and indeed this codepath isn't executed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
