[jira] [Commented] (IGNITE-18341) Calcite engine. Introduce correlate based distribution

Ignite TC Bot (Jira) Mon, 12 Dec 2022 06:59:05 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646138#comment-17646138
 ]


Ignite TC Bot commented on IGNITE-18341:
----------------------------------------

{panel:title=Branch: [pull/10424/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10424/head] Base: [master] : New Tests 
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Calcite SQL{color} [[tests 
2|https://ci2.ignite.apache.org/viewLog.html?buildId=6950074]]
* {color:#013220}IgniteCalciteTestSuite: 
CorrelatedSubqueryPlannerTest.testCorrelatedDistribution - PASSED{color}
* {color:#013220}IgniteCalciteTestSuite: 
CorrelatesIntegrationTest.testCorrelatedDistribution - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=6950153&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Calcite engine. Introduce correlate based distribution
> ------------------------------------------------------
>
>                 Key: IGNITE-18341
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18341
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: calcite, calcite2-required, calcite3-required
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> To propagate hash/affinity distribution in relation node all distribution 
> keys should be contained in the node. It's impossible to pass hash 
> distribution through the node if node knows nothing about hash distribution 
> keys. For example, hash distribution can't bypass aggregate if one or more of 
> distribution keys is not contained in grouped columns.
> Suppose, for example, we have two tables T1 and T2 colocated on fields T1.A 
> and T2.A. The following query:
> {code:java}
> SELECT (SELECT sum(b) FROM t2 WHERE t2.a = t1.a) FROM t1 {code}
> Hash distribution can't be used on the right side of the correlated nested 
> loop join, since aggregate doesn't have required columns, and plan for such a 
> query looks very ineffective:
> {noformat}
> IgniteProject(EXPR$0=[$3]), id = 219
>   IgniteCorrelatedNestedLoopJoin(condition=[true], joinType=[left], 
> variablesSet=[[$cor0]], correlationVariables=[[$cor0]]), id = 218
>     IgniteExchange(distribution=[single]), id = 213
>       IgniteTableScan(table=[[PUBLIC, T1]]), id = 84
>     IgniteColocatedHashAggregate(group=[{}], SUM(B)=[SUM($0)]), id = 217
>       IgniteProject(B=[$1]), id = 216
>         IgniteHashIndexSpool(readType=[LAZY], writeType=[EAGER], 
> searchRow=[[$cor0.A, null]], condition=[=($0, $cor0.A)], allowNulls=[false]), 
> id = 215
>           IgniteExchange(distribution=[single]), id = 214
>             IgniteTableScan(table=[[PUBLIC, T2]], requiredColumns=[{0, 1}]), 
> id = 112{noformat}
> If we look closer to the query we can find that filter {{t2.a = t1.a}} makes 
> this query colocated. If we run such a query on H2 engine with 
> {{colocatedJoin=false}} flag it will return the correct result. 
> To workaround such a problem I propose to introduce some kind of artificial 
> "correlated distribution". This distribution will be produced on the right 
> side of correlated nested loop join, if left side of the join has hash 
> distribution (will contain reference to correlate and distribution of this 
> correlate), than passed through set of nodes without modification and finally 
> on filter node will be restored as hash distribution remaped to input 
> operator fields (if filter contains equality conditions input operator fields 
> and correlated variable fields).
> After such a change plan should be looks like:
> {noformat}
> IgniteExchange(distribution=[single]), id = 283
>   IgniteProject(EXPR$0=[$3]), id = 282
>     IgniteCorrelatedNestedLoopJoin(condition=[true], joinType=[left], 
> variablesSet=[[$cor0]], correlationVariables=[[$cor0]]), id = 281
>       IgniteTableScan(table=[[PUBLIC, T1]]), id = 84
>       IgniteColocatedHashAggregate(group=[{}], SUM(B)=[SUM($0)]), id = 280
>         IgniteProject(B=[$1]), id = 279
>           IgniteFilter(condition=[=($0, $cor0.A)]), id = 278
>             IgniteTableScan(table=[[PUBLIC, T2]], requiredColumns=[{0, 1}]), 
> id = 112
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-18341) Calcite engine. Introduce correlate based distribution

Reply via email to