Alessandro Solimando created CALCITE-7125:
---------------------------------------------

             Summary: Impossible to get a plan with partial aggregate push-down 
via IntersectToDistinctRule
                 Key: CALCITE-7125
                 URL: https://issues.apache.org/jira/browse/CALCITE-7125
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.40.0
            Reporter: Alessandro Solimando


CALCITE-6893 changed _IntersectToDistinctRule_ to rewrite "INTERSECT DISTINCT" 
queries using a tagging plus filtering strategy rather than pushing aggregates 
into the "UNION" inputs.

Rationale was to let other rules (_AGGREGATE_UNION_TRANSPOSE_ is mentioned in 
the ticket description) produce the plan with partial aggregation push-down, 
and let the CBO do the choice.

Consider the following query:
{noformat}
SELECT empno FROM emp
INTERSECT
SELECT empno FROM emp
{noformat}

_IntersectToDistinctRule_, as per CALCITE-6893, would rewrite it as follows:

{noformat}
LogicalProject(EMPNO=[$0])
  LogicalFilter(condition=[AND(>($1, 0), >($2, 0))])
    LogicalAggregate(group=[{0}], count_i0=[COUNT() FILTER (WHERE i = 0)], 
count_i1=[COUNT() FILTER (WHERE i = 1)])
      LogicalUnion(all=[true])
        LogicalProject(EMPNO=[$0], i=[0]) // "tag" i=0
          TableScan(emp)
        LogicalProject(EMPNO=[$0], i=[1]) // "tag" i=1
          TableScan(emp)
{noformat}

This structure introduces:
 - A tagging column "i" added via projection
 - Two Projects wrapping each input
 - A single Aggregate with COUNT() FILTER expressions over those tags
 - A final filter requiring count_iN > 0, for N in "1..#operands"

With this rewrite, the planner:
 - Can no longer push aggregates into the union inputs
 - Cannot apply rules like 
_AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ or 
_AGGREGATE_PROJECT_MERGE_/_AggregateProjectMergeRule_
 - Fails to recover the earlier, sometimes more efficient plan shape:

{noformat}
LogicalAggregate(group=[{0}])
  LogicalUnion(all=[true])
    LogicalAggregate(group=[{0}])
      ...
    LogicalAggregate(group=[{0}])
      ...
{noformat}

This prevents early row reduction and downstream rule composition, even when 
all relevant transformation rules are enabled.

Prior to CALCITE-6893, the same query with the same planner configuration 
produced the optimized form above. That plan is no longer reachable with the 
current {_}IntersectToDistinctRule{_}, and no combination of core rules 
(including _AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_, 
_PROJECT_SET_OP_TRANSPOSE_/_ProjectSetOpTransposeRule_, etc.) can recover it.

Unless I am mistaken and someone can suggest further rules that can lead to the 
same plan as before, this seems like a regression.

My proposal would be to either revert the change, or to at least expose the 
"old" behavior via a configuration flag.

My understanding is that, the original rewrite proposed in CALCITE-6893 was 
supposed to rewritable (by means of 
_AGGREGATE_UNION_TRANSPOSE_/_AggregateUnionTransposeRule_ rule). The rewrite 
was incorrect and the currently implemented form has been suggested, but it 
hasn't been discussed if the "new" plan shape (post CALCITE-6893) could be 
transformed into the "old" plan shape (pre CALCITE-6893).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to