[
https://issues.apache.org/jira/browse/CALCITE-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011864#comment-18011864
]
Alessandro Solimando commented on CALCITE-7125:
-----------------------------------------------
https://github.com/asolimando/calcite/tree/main-allow_aggregate_pushdown_for_intersecttodistinctrule
here I sketched the solution via the config rule flag, and added two
reproducer tests in RelOptRulesTest in case someone wants to try to get back
the "old" plan or check more easily what's happening
> Impossible to get a plan with partial aggregate push-down via
> IntersectToDistinctRule
> -------------------------------------------------------------------------------------
>
> Key: CALCITE-7125
> URL: https://issues.apache.org/jira/browse/CALCITE-7125
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.40.0
> Reporter: Alessandro Solimando
> Priority: Major
>
> CALCITE-6893 changed _IntersectToDistinctRule_ to rewrite "INTERSECT
> DISTINCT" queries using a tagging plus filtering strategy rather than pushing
> aggregates into the "UNION" inputs.
> Rationale was to let other rules (_AGGREGATE_UNION_TRANSPOSE_ is mentioned in
> the ticket description) produce the plan with partial aggregation push-down,
> and let the CBO do the choice.
> Consider the following query:
> {noformat}
> SELECT empno FROM emp
> INTERSECT
> SELECT empno FROM emp
> {noformat}
> _IntersectToDistinctRule_, as per CALCITE-6893, would rewrite it as follows:
> {noformat}
> LogicalProject(EMPNO=[$0])
> LogicalFilter(condition=[AND(>($1, 0), >($2, 0))])
> LogicalAggregate(group=[{0}], count_i0=[COUNT() FILTER (WHERE i = 0)],
> count_i1=[COUNT() FILTER (WHERE i = 1)])
> LogicalUnion(all=[true])
> LogicalProject(EMPNO=[$0], i=[0]) // "tag" i=0
> TableScan(emp)
> LogicalProject(EMPNO=[$0], i=[1]) // "tag" i=1
> TableScan(emp)
> {noformat}
> This structure introduces:
> - A tagging column "i" added via projection
> - Two Projects wrapping each input
> - A single Aggregate with COUNT() FILTER expressions over those tags
> - A final filter requiring count_iN > 0, for N in "1..#operands"
> With this rewrite, the planner:
> - Can no longer push aggregates into the union inputs
> - Cannot apply rules like
> _AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ or
> _AGGREGATE_PROJECT_MERGE_/_AggregateProjectMergeRule_
> - Fails to recover the earlier, sometimes more efficient plan shape:
> {noformat}
> LogicalAggregate(group=[{0}])
> LogicalUnion(all=[true])
> LogicalAggregate(group=[{0}])
> ...
> LogicalAggregate(group=[{0}])
> ...
> {noformat}
> This prevents early row reduction and downstream rule composition, even when
> all relevant transformation rules are enabled.
> Prior to CALCITE-6893, the same query with the same planner configuration
> produced the optimized form above. That plan is no longer reachable with the
> current {_}IntersectToDistinctRule{_}, and no combination of core rules
> (including _AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_,
> _PROJECT_SET_OP_TRANSPOSE_/_ProjectSetOpTransposeRule_, etc.) can recover it.
> Unless I am mistaken and someone can suggest further rules that can lead to
> the same plan as before, this seems like a regression.
> My proposal would be to either revert the change, or to at least expose the
> "old" behavior via a configuration flag.
> My understanding is that, the original rewrite proposed in CALCITE-6893 was
> supposed to rewritable (by means of
> _AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ rule). The rewrite
> was incorrect and the currently implemented form has been suggested, but it
> hasn't been discussed if the "new" plan shape (post CALCITE-6893) could be
> transformed into the "old" plan shape (pre CALCITE-6893).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)