[
https://issues.apache.org/jira/browse/CALCITE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated CALCITE-7203:
------------------------------------------
Description:
[IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
repeatedly creates cast expressions between pair of intersect operands, while
we could "pre-compute" these join keys targeting the row type of the n-way
intersect expression, which is the final type that all intersect operands must
conform to.
Computing the join keys pair-wise, the current implementation, introduces
duplicates and noise due to the partial type unification vs the stable type
unification over the final/global row type.
[planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
could be simplified;
before:
{noformat}
EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
proj#0..1=[{exprs}])
EnumerableAggregate(group=[{0}])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1], A0=[$t1]) <= extra A0
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
after:
{noformat}
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1]) <= no more A0
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
[This PR
discussion|https://github.com/apache/calcite/pull/4557#discussion_r2384022473]
elaborates even more on why this is needed.
was:
[IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
repeatedly creates cast expressions between pair of intersect operands, while
we could "pre-compute" these join keys targeting the row type of the n-way
intersect expression, which is the final type that all intersect operands must
conform to.
Computing the join keys pair-wise, the current implementation, introduces
duplicates and noise due to the partial type unification vs the stable type
unification over the final/global row type.
[planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
could be simplified;
before:
{noformat}
EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], A=[$t2])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
proj#0..1=[{exprs}])
EnumerableAggregate(group=[{0}])
EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1], A0=[$t1])
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1], A0=[$t1]) <= extra A0
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
after:
{noformat}
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableAggregate(group=[{0}])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1]) <= no more A0
EnumerableValues(tuples=[[{ 1 }, { 2 }]])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
> IntersectToSemiJoinRule should compute once the join keys and reuse them to
> avoid duplicates
> --------------------------------------------------------------------------------------------
>
> Key: CALCITE-7203
> URL: https://issues.apache.org/jira/browse/CALCITE-7203
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.40.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
>
> [IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
> repeatedly creates cast expressions between pair of intersect operands,
> while we could "pre-compute" these join keys targeting the row type of the
> n-way intersect expression, which is the final type that all intersect
> operands must conform to.
> Computing the join keys pair-wise, the current implementation, introduces
> duplicates and noise due to the partial type unification vs the stable type
> unification over the final/global row type.
> [planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
> could be simplified;
> before:
> {noformat}
> EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)],
> A=[$t2])
> EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> proj#0..1=[{exprs}])
> EnumerableAggregate(group=[{0}])
> EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1], A0=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
> 5.0 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1], A0=[$t1])
> EnumerableValues(tuples=[[{ 1 }, { 2 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1], A0=[$t1]) <= extra A0
> EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
> after:
> {noformat}
> EnumerableAggregate(group=[{0}])
> EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
> joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1])
> EnumerableAggregate(group=[{0}])
> EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
> joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
> 5.0 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1]) <= no more A0
> EnumerableValues(tuples=[[{ 1 }, { 2 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
> [This PR
> discussion|https://github.com/apache/calcite/pull/4557#discussion_r2384022473]
> elaborates even more on why this is needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)