andygrove opened a new issue, #744:
URL: https://github.com/apache/datafusion-comet/issues/744
### What is the problem the feature request solves?
The benchmarks in `CometAggregateBenchmark` show that `COUNT` is slower than
Spark, but `SUM` is faster than Spark. There should not be so much difference
between these two aggregates. I could not reproduce the performance difference
in standalone DataFusion.
### SUM
```
Grouped HashAgg Exec: single group key (cardinality 1048576), single
aggregate SUM: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per
Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (SUM)
1672 1698 37 6.3 159.4
1.0X
SQL Parquet - Comet (Scan) (SUM)
1913 1993 112 5.5 182.5
0.9X
SQL Parquet - Comet (Scan, Exec) (SUM)
669 798 113 15.7 63.8
2.5X
```
### COUNT
```
Grouped HashAgg Exec: single group key (cardinality 1048576), single
aggregate COUNT: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per
Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (COUNT)
1796 1827 43 5.8 171.3
1.0X
SQL Parquet - Comet (Scan) (COUNT)
1810 1853 61 5.8 172.6
1.0X
SQL Parquet - Comet (Scan, Exec) (COUNT)
2827 2867 56 3.7 269.6
0.6X
```
### Describe the potential solution
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]