[jira] [Assigned] (HIVE-26043) Use constraint info when creating RexNodes

Krisztian Kasa (Jira) Thu, 17 Mar 2022 06:21:40 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krisztian Kasa reassigned HIVE-26043:
-------------------------------------


> Use constraint info when creating RexNodes
> ------------------------------------------
>
>                 Key: HIVE-26043
>                 URL: https://issues.apache.org/jira/browse/HIVE-26043
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>
> Prior HIVE-23100 Not null constraints affected newly created RexNode type 
> nullability.
> Nullability enables the subquery rewrite algorithm to generate more optimal 
> plan.
> [https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324]
> Example:
> {code:java}
> explain cbo
> select ws_sales_price
>  from web_sales, customer, item
>  where ws_bill_customer_sk = c_customer_sk
>       and ws_item_sk = i_item_sk
>       and ( c_customer_sk = 1
>             or
>             i_item_id in (select i_item_id
>                              from item
>                              where i_item_sk in (2, 3)
>                              )
>           );
> {code}
> Without not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))])
>     HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], 
> ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], 
> c=[$5], i_item_id0=[$6], literalTrue=[$7])
>       HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>         HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>           HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
> ws_sales_price=[$20])
>             HiveFilter(condition=[IS NOT NULL($3)])
>               HiveTableScan(table=[[default, web_sales]], 
> table:alias=[web_sales])
>           HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], 
> cost=[not available])
>             HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
> cost=[not available])
>               HiveProject(i_item_sk=[$0], i_item_id=[$1])
>                 HiveTableScan(table=[[default, item]], table:alias=[item])
>               HiveProject(c=[$0])
>                 HiveAggregate(group=[{}], c=[COUNT()])
>                   HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                     HiveTableScan(table=[[default, item]], table:alias=[item])
>             HiveProject(i_item_id=[$0], literalTrue=[true])
>               HiveAggregate(group=[{1}])
>                 HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                   HiveTableScan(table=[[default, item]], table:alias=[item])
>         HiveProject(c_customer_sk=[$0])
>           HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> With not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))])
>     HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], 
> ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], 
> i_item_id0=[$5], literalTrue=[$6])
>       HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>         HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>           HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
> ws_sales_price=[$20])
>             HiveFilter(condition=[IS NOT NULL($3)])
>               HiveTableScan(table=[[default, web_sales]], 
> table:alias=[web_sales])
>           HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], 
> cost=[not available])
>             HiveProject(i_item_sk=[$0], i_item_id=[$1])
>               HiveTableScan(table=[[default, item]], table:alias=[item])
>             HiveProject(i_item_id=[$0], literalTrue=[true])
>               HiveAggregate(group=[{1}])
>                 HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                   HiveTableScan(table=[[default, item]], table:alias=[item])
>         HiveProject(c_customer_sk=[$0])
>           HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> In the first plan when not null constraints was ignored there is an extra 
> {{item}} table join without join condition:
> {code:java}
>             HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
> cost=[not available])
>               HiveProject(i_item_sk=[$0], i_item_id=[$1])
>                 HiveTableScan(table=[[default, item]], table:alias=[item])
>               HiveProject(c=[$0])
>                 HiveAggregate(group=[{}], c=[COUNT()])
>                   HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>                     HiveTableScan(table=[[default, item]], table:alias=[item])
> {code}
> The planner is not aware that the {{i_item_id}} column has {{not null}} 
> defined and it expects null values which needs the extra join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26043) Use constraint info when creating RexNodes

Reply via email to