mertak-synnada opened a new issue, #15633:
URL: https://github.com/apache/datafusion/issues/15633

   ### Current Behavior
   When a query such as the following is executed:
   `EXPLAIN SELECT SUM(id), SUM(id) + 1, SUM(id) + 2 FROM employees;`
   The resulting execution plans are:
   **Logical Plan:**
   ```
   Projection: sum(employees.id), sum(employees.id) + Int64(1), sum(employee
   s.id) + Int64(2)
     Aggregate: groupBy=[[]], aggr=[[sum(CAST(employees.id AS Int64))]]
       TableScan: employees projection=[id]
   ```
   **Physical Plan:**
   ```
   ProjectionExec: expr=[sum(employees.id)@0 as sum(employees.id), sum(emp
   loyees.id)@0 + 1 as sum(employees.id) + Int64(1), sum(employees.id)@0 + 2
   as sum(employees.id) + Int64(2)]
     AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id)]
       MemoryExec: partitions=1, partition_sizes=[1]
   ```
   
   As demonstrated, the repeated calculation of SUM(employees.id) is performed 
only
   once. The subsequent additions ( +1 and +2 ) are handled by the 
`ProjectionExec`
   through basic arithmetic operations.
   
   ### Problematic Case
   Consider a more complex query inspired by Clickbench Query 29:
   SELECT SUM(id), SUM(id+1), SUM(id+2), ..., SUM(id+89) FROM employees;
   The execution plans are:
   **Logical Plan:**
   ```
   Aggregate: groupBy=[[]], aggr=[[sum(__common_expr_1 AS employees.id), su
   m(__common_expr_1 AS employees.id + Int64(1)), ... , sum(__common_expr_1 A
   S employees.id + Int64(89))]]
     Projection: CAST(employees.id AS Int64) AS __common_expr_1
       TableScan: employees projection=[id]
   ```
   **Physical Plan:**
   ```
   AggregateExec: mode=Single, gby=[], aggr=[sum(employees.id), sum(employ
   ees.id + Int64(1)), ... , sum(employees.id + Int64(89))]
     ProjectionExec: expr=[CAST(id@0 AS Int64) as __common_expr_1]
       MemoryExec: partitions=1, partition_sizes=[1]
   ```
   
   In this case, each `SUM` operation is calculated independently, which is 
inefficient.
   The desired approach is to compute `SUM(employees.id)` once and then apply 
the
   linearity property to compute` SUM(id + n)` as `SUM(id) + SUM(n)` for each 
additional
   term ( 1...89 ). If the `N` is a constant (here it is) the operation should 
be converted as
   `N*COUNT(1)` .
   
   Other references: https://github.com/apache/datafusion/pull/15532
   https://github.com/apache/datafusion/issues/15524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to