berkaysynnada commented on issue #15524:
URL: https://github.com/apache/datafusion/issues/15524#issuecomment-2771901348

   I haven't reviewed the PR yet, but I agree with @Dandandan, and I think we 
can improve this further. We've actually thought about this issue before and 
sketched out an initial design. Let me share some notes from that:
   
   This simplification should be based on the **linearity property**, not just 
`SUM()` and `COUNT()` rewrites. Formally:
   
   **_f(x + y) = f(x) + f(y)_**, if f is a linear function.
   
   So, I believe we should define a `"linear function"` tag for all such 
functions.
   
   Consider the same query:
   ```sql
   SELECT SUM(id), SUM(id + 1), SUM(id + 2), ..., SUM(id + 89) FROM employees;
   ```
   
   LP:
   ```
   --Aggregate: groupBy=[[]], aggr=[
     sum(__common_expr_1 AS employees.id),
     sum(__common_expr_1 AS employees.id + Int64(1)),
     ...,
     sum(__common_expr_1 AS employees.id + Int64(89))
   ]
   ----Projection: CAST(employees.id AS Int64) AS __common_expr_1
   ------TableScan: employees projection=[id]
   ```
   
   PP:
   ```
   --AggregateExec: mode=Single, gby=[], aggr=[
     sum(employees.id),
     sum(employees.id + Int64(1)),
     ...,
     sum(employees.id + Int64(89))
   ]
   ----ProjectionExec: expr=[CAST(id@0 AS Int64) as __common_expr_1]
   ------MemoryExec: partitions=1, partition_sizes=[1]
   ```
   
   We should apply the linearity property here to simplify expressions like 
SUM(id + n) into SUM(id) + n * COUNT(1), when n is a constant. It doesn't 
effect the performance of this clickbench query, but we should also properly 
handle the cases when n is not constant as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to