[I] Improve performance of COUNT aggregates [datafusion-comet]

via GitHub Wed, 31 Jul 2024 09:25:16 -0700


andygrove opened a new issue, #744:
URL: https://github.com/apache/datafusion-comet/issues/744


   ### What is the problem the feature request solves?
   
   The benchmarks in `CometAggregateBenchmark` show that `COUNT` is slower than 
Spark, but `SUM` is faster than Spark. There should not be so much difference 
between these two aggregates. I could not reproduce the performance difference 
in standalone DataFusion.
   
   ### SUM
   
   ```
   Grouped HashAgg Exec: single group key (cardinality 1048576), single 
aggregate SUM:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per 
Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet - Spark (SUM)                                                    
                1672           1698          37          6.3         159.4      
 1.0X
   SQL Parquet - Comet (Scan) (SUM)                                             
                1913           1993         112          5.5         182.5      
 0.9X
   SQL Parquet - Comet (Scan, Exec) (SUM)                                       
                 669            798         113         15.7          63.8      
 2.5X
   ```
   
   ### COUNT
   
   ```
   Grouped HashAgg Exec: single group key (cardinality 1048576), single 
aggregate COUNT:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per 
Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet - Spark (COUNT)                                                  
                  1796           1827          43          5.8         171.3    
   1.0X
   SQL Parquet - Comet (Scan) (COUNT)                                           
                  1810           1853          61          5.8         172.6    
   1.0X
   SQL Parquet - Comet (Scan, Exec) (COUNT)                                     
                  2827           2867          56          3.7         269.6    
   0.6X
   ```
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Improve performance of COUNT aggregates [datafusion-comet]

Reply via email to