Yu Xu created CALCITE-7484:
------------------------------
Summary: Add AggregateFunctionOfGroupByKeysRule to eliminate
redundant aggregates over GROUP BY keys
Key: CALCITE-7484
URL: https://issues.apache.org/jira/browse/CALCITE-7484
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.41.0
Reporter: Yu Xu
Assignee: Yu Xu
Fix For: 1.42.0
Sql like:
{code:java}
select sal, max(sal) as sal_max, sum(comm) as comm_sum from emp group by sal,
deptno; {code}
It should be optimized as follows (the calculation of the aggregate function
max is redundant):
{code:java}
select sal, sal as sal_max, sum(comm) as comm_sum from emp group by sal,
deptno; {code}
and current plan:
{code:java}
LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
LogicalAggregate(group=[{0, 1}], SAL_MAX=[MAX($0)], COMM_SUM=[SUM($2)])
LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
It would be better to optimized to:
{code:java}
LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
LogicalProject(SAL=[$0], DEPTNO=[$1], SAL0=[$0], COMM_SUM=[$2])
LogicalAggregate(group=[{0, 1}], COMM_SUM=[SUM($2)])
LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
As far as I know, similar optimizations exist in some mainstream databases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)