mertak-synnada opened a new issue, #15633: URL: https://github.com/apache/datafusion/issues/15633
### Current Behavior When a query such as the following is executed: `EXPLAIN SELECT SUM(id), SUM(id) + 1, SUM(id) + 2 FROM employees;` The resulting execution plans are: **Logical Plan:** ``` Projection: sum(employees.id), sum(employees.id) + Int64(1), sum(employee s.id) + Int64(2) Aggregate: groupBy=[[]], aggr=[[sum(CAST(employees.id AS Int64))]] TableScan: employees projection=[id] ``` **Physical Plan:** ``` ProjectionExec: expr=[sum(employees.id)@0 as sum(employees.id), sum(emp loyees.id)@0 + 1 as sum(employees.id) + Int64(1), sum(employees.id)@0 + 2 as sum(employees.id) + Int64(2)] AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id)] MemoryExec: partitions=1, partition_sizes=[1] ``` As demonstrated, the repeated calculation of SUM(employees.id) is performed only once. The subsequent additions ( +1 and +2 ) are handled by the `ProjectionExec` through basic arithmetic operations. ### Problematic Case Consider a more complex query inspired by Clickbench Query 29: SELECT SUM(id), SUM(id+1), SUM(id+2), ..., SUM(id+89) FROM employees; The execution plans are: **Logical Plan:** ``` Aggregate: groupBy=[[]], aggr=[[sum(__common_expr_1 AS employees.id), su m(__common_expr_1 AS employees.id + Int64(1)), ... , sum(__common_expr_1 A S employees.id + Int64(89))]] Projection: CAST(employees.id AS Int64) AS __common_expr_1 TableScan: employees projection=[id] ``` **Physical Plan:** ``` AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id), sum(employ ees.id + Int64(1)), ... , sum(employees.id + Int64(89))] ProjectionExec: expr=[CAST(id@0 AS Int64) as __common_expr_1] MemoryExec: partitions=1, partition_sizes=[1] ``` In this case, each `SUM` operation is calculated independently, which is inefficient. The desired approach is to compute `SUM(employees.id)` once and then apply the linearity property to compute` SUM(id + n)` as `SUM(id) + SUM(n)` for each additional term ( 1...89 ). If the `N` is a constant (here it is) the operation should be converted as `N*COUNT(1)` . Other references: https://github.com/apache/datafusion/pull/15532 https://github.com/apache/datafusion/issues/15524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org