Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

via GitHub Thu, 01 Aug 2024 13:18:56 -0700


alamb commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2263910990


   > #11762
   > 
   > I think what I need to do is find a query that is currently slower in 
single mode, and find a way to optimize it like partial/final way in single 
execution node? 🤔
   
   What exactly does `single_mode` do? 
https://github.com/apache/datafusion/pull/11762 looks like maybe it just uses a 
single group by node?
   
   
   > Does anyone know what kind of query that is what partial/final group by 
good at?
   
   
   I think they are good at being able to use multiple to do the work in 
parallel
   
   They are especially good at low cardinality aggregates (some of the TPCH 
ones for example where there are 4 distinct groups) as the hash tables are 
small and the final shuffle is very small. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

Reply via email to