ctsk opened a new pull request, #16165:
URL: https://github.com/apache/datafusion/pull/16165

   This PR hard-codes the seed for the hash aggregation. The main benefit 
compared to the previously runtime-determined seed is that after applying this 
PR, partial aggregation and final aggregation will share the same hash function.
   
   I haven't measured it, but in theory, this should make the final aggregation 
step more efficient, because the partial aggregation will emit the group values 
in a way that will be clustered in the final aggregation hash table - thus 
causing a benefitial memory access pattern when building the final aggregation.
   
   I expect it speeds up large-cardinality aggregations that don't trigger the 
skipping of the partial aggregation step a tiny bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to