ctsk opened a new pull request, #16165: URL: https://github.com/apache/datafusion/pull/16165
This PR hard-codes the seed for the hash aggregation. The main benefit compared to the previously runtime-determined seed is that after applying this PR, partial aggregation and final aggregation will share the same hash function. I haven't measured it, but in theory, this should make the final aggregation step more efficient, because the partial aggregation will emit the group values in a way that will be clustered in the final aggregation hash table - thus causing a benefitial memory access pattern when building the final aggregation. I expect it speeds up large-cardinality aggregations that don't trigger the skipping of the partial aggregation step a tiny bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org