berkaysynnada commented on issue #15524: URL: https://github.com/apache/datafusion/issues/15524#issuecomment-2771901348
I haven't reviewed the PR yet, but I agree with @Dandandan, and I think we can improve this further. We've actually thought about this issue before and sketched out an initial design. Let me share some notes from that: This simplification should be based on the **linearity property**, not just `SUM()` and `COUNT()` rewrites. Formally: **_f(x + y) = f(x) + f(y)_**, if f is a linear function. So, I believe we should define a `"linear function"` tag for all such functions. Consider the same query: ```sql SELECT SUM(id), SUM(id + 1), SUM(id + 2), ..., SUM(id + 89) FROM employees; ``` LP: ``` --Aggregate: groupBy=[[]], aggr=[ sum(__common_expr_1 AS employees.id), sum(__common_expr_1 AS employees.id + Int64(1)), ..., sum(__common_expr_1 AS employees.id + Int64(89)) ] ----Projection: CAST(employees.id AS Int64) AS __common_expr_1 ------TableScan: employees projection=[id] ``` PP: ``` --AggregateExec: mode=Single, gby=[], aggr=[ sum(employees.id), sum(employees.id + Int64(1)), ..., sum(employees.id + Int64(89)) ] ----ProjectionExec: expr=[CAST(id@0 AS Int64) as __common_expr_1] ------MemoryExec: partitions=1, partition_sizes=[1] ``` We should apply the linearity property here to simplify expressions like SUM(id + n) into SUM(id) + n * COUNT(1), when n is a constant. It doesn't effect the performance of this clickbench query, but we should also properly handle the cases when n is not constant as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org