Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

via GitHub Tue, 30 Jul 2024 17:43:23 -0700


jayzhan211 commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2259425914


   The experiment I did in #11708 shows that
   1. There is no much difference for clickbench Q17
   2. Outperform for high cardinality, row num 2,000,000 with all the value is 
different
   3. Simplify Repartition Hash code largely
   
   @alamb If the benchmark code looks good to you, I think we could reuse hash
   To further improve clickbench Q17, the bottleneck is now arrow::Row 
(RowConverter::append, and Rows::push), do you think there is room for 
improvement? or should we find a way to reduce Rows by design
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

Reply via email to