[ https://issues.apache.org/jira/browse/HIVE-26043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa reassigned HIVE-26043: ------------------------------------- > Use constraint info when creating RexNodes > ------------------------------------------ > > Key: HIVE-26043 > URL: https://issues.apache.org/jira/browse/HIVE-26043 > Project: Hive > Issue Type: Improvement > Components: CBO > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > > Prior HIVE-23100 Not null constraints affected newly created RexNode type > nullability. > Nullability enables the subquery rewrite algorithm to generate more optimal > plan. > [https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324] > Example: > {code:java} > explain cbo > select ws_sales_price > from web_sales, customer, item > where ws_bill_customer_sk = c_customer_sk > and ws_item_sk = i_item_sk > and ( c_customer_sk = 1 > or > i_item_id in (select i_item_id > from item > where i_item_sk in (2, 3) > ) > ); > {code} > Without not null constraints > {code:java} > HiveProject(ws_sales_price=[$2]) > HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))]) > HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], > ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], > c=[$5], i_item_id0=[$6], literalTrue=[$7]) > HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], > ws_sales_price=[$20]) > HiveFilter(condition=[IS NOT NULL($3)]) > HiveTableScan(table=[[default, web_sales]], > table:alias=[web_sales]) > HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], > cost=[not available]) > HiveJoin(condition=[true], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(c=[$0]) > HiveAggregate(group=[{}], c=[COUNT()]) > HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(i_item_id=[$0], literalTrue=[true]) > HiveAggregate(group=[{1}]) > HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(c_customer_sk=[$0]) > HiveTableScan(table=[[default, customer]], table:alias=[customer]) > {code} > With not null constraints > {code:java} > HiveProject(ws_sales_price=[$2]) > HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))]) > HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], > ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], > i_item_id0=[$5], literalTrue=[$6]) > HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], > ws_sales_price=[$20]) > HiveFilter(condition=[IS NOT NULL($3)]) > HiveTableScan(table=[[default, web_sales]], > table:alias=[web_sales]) > HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], > cost=[not available]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(i_item_id=[$0], literalTrue=[true]) > HiveAggregate(group=[{1}]) > HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(c_customer_sk=[$0]) > HiveTableScan(table=[[default, customer]], table:alias=[customer]) > {code} > In the first plan when not null constraints was ignored there is an extra > {{item}} table join without join condition: > {code:java} > HiveJoin(condition=[true], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > HiveProject(c=[$0]) > HiveAggregate(group=[{}], c=[COUNT()]) > HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)]) > HiveTableScan(table=[[default, item]], table:alias=[item]) > {code} > The planner is not aware that the {{i_item_id}} column has {{not null}} > defined and it expects null values which needs the extra join. -- This message was sent by Atlassian Jira (v8.20.1#820001)