[ https://issues.apache.org/jira/browse/IGNITE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Steshin updated IGNITE-25218: -------------------------------------- Description: Current TPCH queries #5, #7 are slow (scale 0.1). Let's consider {*}#5{*}. The *plan* is: {code:java} IgniteSort(sort0=[$1], dir0=[DESC-nulls-last]) IgniteColocatedHashAggregate(group=[{0}], REVENUE=[SUM($1)]) IgniteProject(N_NAME=[$12], $f1=[*($7, -(1, $8))]) [1] IgniteNestedLoopJoin(condition=[AND(=($4, $10), =($5, $0))], joinType=[inner]) [2] IgniteMergeJoin(condition=[=($3, $1)], joinType=[inner], leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, ORDERS]], index=[O_CK_proxy], filters=[AND(>=($t2, 1994-01-01), <($t2, +(1994-01-01, *(12:INTERVAL YEAR, 1))))], requiredColumns=[{2, 3, 6}], collation=[[3 ASC-nulls-first, 2 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, CUSTOMER]], index=[_key_PK_proxy], requiredColumns=[{2, 5}], collation=[[2 ASC-nulls-first]]) [3] IgniteNestedLoopJoin(condition=[=($1, $4)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, LINEITEM]], requiredColumns=[{2, 4, 7, 8}]) [4] IgniteMergeJoin(condition=[=($1, $2)], joinType=[inner], leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, SUPPLIER]], index=[S_NK_proxy], requiredColumns=[{2, 5}], collation=[[5 ASC-nulls-first, 2 ASC-nulls-first]]) [5] IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, NATION]], index=[_key_PK_proxy], requiredColumns=[{2, 3, 4}], collation=[[2 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, REGION]], filters=[=($t1, _UTF-8'ASIA')], requiredColumns=[{2, 3}]) {code} ** *The problem* is the NL join [1]. Calcite chooses NL because it *estimanes only few rows* producted by another joins in the plan. Compare estimations of join rows number against real issued join rows: ||Join||Estimated rows||Created rows|| |[1]|0.0225|865| |[2]|2250|22958| |[3]|0.15|134374| |[4]|0.15|225| |[5]|0.15|5| {*}Workaround{*}: /*+ MERGE_JOIN */ or /*+ NO_NL_JOIN */ was: Current TPCH queries #5, #7 are slow (scale 0.1). Let's consider {*}#5{*}. The *plan* is: {code:java} IgniteSort(sort0=[$1], dir0=[DESC-nulls-last]) IgniteColocatedHashAggregate(group=[{0}], REVENUE=[SUM($1)]) IgniteProject(N_NAME=[$12], $f1=[*($7, -(1, $8))]) [1] IgniteNestedLoopJoin(condition=[AND(=($4, $10), =($5, $0))], joinType=[inner]) [2] IgniteMergeJoin(condition=[=($3, $1)], joinType=[inner], leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, ORDERS]], index=[O_CK_proxy], filters=[AND(>=($t2, 1994-01-01), <($t2, +(1994-01-01, *(12:INTERVAL YEAR, 1))))], requiredColumns=[{2, 3, 6}], collation=[[3 ASC-nulls-first, 2 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, CUSTOMER]], index=[_key_PK_proxy], requiredColumns=[{2, 5}], collation=[[2 ASC-nulls-first]]) [3] IgniteNestedLoopJoin(condition=[=($1, $4)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, LINEITEM]], requiredColumns=[{2, 4, 7, 8}]) [4] IgniteMergeJoin(condition=[=($1, $2)], joinType=[inner], leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, SUPPLIER]], index=[S_NK_proxy], requiredColumns=[{2, 5}], collation=[[5 ASC-nulls-first, 2 ASC-nulls-first]]) [5] IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteIndexScan(table=[[PUBLIC, NATION]], index=[_key_PK_proxy], requiredColumns=[{2, 3, 4}], collation=[[2 ASC-nulls-first]]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, REGION]], filters=[=($t1, _UTF-8'ASIA')], requiredColumns=[{2, 3}]) {code} ** *The problem* is the NL join [1]. Calcite chooses NL because it *estimanes only few rows* producted by another joins in the plan. Compare estimations of join rows number against real issued join rows: ||Join||Estimated rows||Created rows|| |[1]|0.0225|865| |[2]|2250|22958| |[3]|0.15|134374| |[4]|0.15|225| |[5]|0.15|5| {*}Workaround{*}: /*+ MERGE_JOIN */ > Calcite. Revise joins rows estimation. > -------------------------------------- > > Key: IGNITE-25218 > URL: https://issues.apache.org/jira/browse/IGNITE-25218 > Project: Ignite > Issue Type: Improvement > Reporter: Vladimir Steshin > Priority: Major > Labels: calcite, ignite-2, ise > > Current TPCH queries #5, #7 are slow (scale 0.1). Let's consider {*}#5{*}. > The *plan* is: > {code:java} > IgniteSort(sort0=[$1], dir0=[DESC-nulls-last]) > IgniteColocatedHashAggregate(group=[{0}], REVENUE=[SUM($1)]) > IgniteProject(N_NAME=[$12], $f1=[*($7, -(1, $8))]) > [1] IgniteNestedLoopJoin(condition=[AND(=($4, $10), =($5, $0))], > joinType=[inner]) > [2] IgniteMergeJoin(condition=[=($3, $1)], joinType=[inner], > leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 > ASC-nulls-first]]) > IgniteExchange(distribution=[single]) > IgniteIndexScan(table=[[PUBLIC, ORDERS]], index=[O_CK_proxy], > filters=[AND(>=($t2, 1994-01-01), <($t2, +(1994-01-01, *(12:INTERVAL YEAR, > 1))))], requiredColumns=[{2, 3, 6}], collation=[[3 ASC-nulls-first, 2 > ASC-nulls-first]]) > IgniteExchange(distribution=[single]) > IgniteIndexScan(table=[[PUBLIC, CUSTOMER]], > index=[_key_PK_proxy], requiredColumns=[{2, 5}], collation=[[2 > ASC-nulls-first]]) > [3] IgniteNestedLoopJoin(condition=[=($1, $4)], joinType=[inner]) > IgniteExchange(distribution=[single]) > IgniteTableScan(table=[[PUBLIC, LINEITEM]], requiredColumns=[{2, > 4, 7, 8}]) > [4] IgniteMergeJoin(condition=[=($1, $2)], joinType=[inner], > leftCollation=[[1 ASC-nulls-first, 0 ASC-nulls-first]], rightCollation=[[0 > ASC-nulls-first]]) > IgniteExchange(distribution=[single]) > IgniteIndexScan(table=[[PUBLIC, SUPPLIER]], index=[S_NK_proxy], > requiredColumns=[{2, 5}], collation=[[5 ASC-nulls-first, 2 ASC-nulls-first]]) > [5] IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) > IgniteExchange(distribution=[single]) > IgniteIndexScan(table=[[PUBLIC, NATION]], > index=[_key_PK_proxy], requiredColumns=[{2, 3, 4}], collation=[[2 > ASC-nulls-first]]) > IgniteExchange(distribution=[single]) > IgniteTableScan(table=[[PUBLIC, REGION]], filters=[=($t1, > _UTF-8'ASIA')], requiredColumns=[{2, 3}]) > {code} > ** > *The problem* is the NL join [1]. > Calcite chooses NL because it *estimanes only few rows* producted by another > joins in the plan. > Compare estimations of join rows number against real issued join rows: > ||Join||Estimated rows||Created rows|| > |[1]|0.0225|865| > |[2]|2250|22958| > |[3]|0.15|134374| > |[4]|0.15|225| > |[5]|0.15|5| > > {*}Workaround{*}: > /*+ MERGE_JOIN */ or /*+ NO_NL_JOIN */ -- This message was sent by Atlassian Jira (v8.20.10#820010)