[ 
https://issues.apache.org/jira/browse/HIVE-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161477#comment-14161477
 ] 

Mostafa Mokhtar commented on HIVE-8280:
---------------------------------------

The issue is not yet fixed as PK side always returns selectivity of 1 due to 
the following code :  

{code}
 if (pkSide == 1) {
      FKSideInfo fkInfo = new FKSideInfo(leftRowCount,
          leftNDV);
      PKSideInfo pkInfo = new PKSideInfo(rightRowCount,
          rightNDV,
          joinRel.getJoinType().generatesNullsOnLeft() ? 1.0 :
            isPKSideSimpleTree ?  RelMetadataQuery.getSelectivity(right, 
rightPred) : 1.0);
{code}

When a filter is applied on the PK side isPKSideSimpleTree is false.

Logs below show that the PK side has a row count of 19.27 but selectivity is 1.

{code}
          HiveProjectRel(ss_customer_sk=[$1], ss_item_sk=[$0], 
ss_ticket_number=[$3]): rowcount = 5.50076554E8, cumulative cost = 
{5.500765732727273E8 rows, 0.0 cpu, 0.0 io}, id = 170
            HiveProjectRel(ss_item_sk=[$0], ss_customer_sk=[$1], 
ss_store_sk=[$2], ss_ticket_number=[$3], s_store_sk=[$4], s_market_id=[$5]): 
rowcount = 5.50076554E8, cumulative cost = {5.500765732727273E8 rows, 0.0 cpu, 
0.0 io}, id = 212
              HiveJoinRel(condition=[=($4, $2)], joinType=[inner]): rowcount = 
5.50076554E8, cumulative cost = {5.500765732727273E8 rows, 0.0 cpu, 0.0 io}, id 
= 207
                HiveProjectRel(ss_item_sk=[$1], ss_customer_sk=[$2], 
ss_store_sk=[$6], ss_ticket_number=[$8]): rowcount = 5.50076554E8, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 162
                  
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store_sales]]): 
rowcount = 5.50076554E8, cumulative cost = {0}, id = 6
                HiveProjectRel(s_store_sk=[$0], s_market_id=[$10]): rowcount = 
19.272727272727273, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 166
                  HiveFilterRel(condition=[=($10, 4)]): rowcount = 
19.272727272727273, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 164
                    
HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store]]): rowcount 
= 212.0, cumulative cost = {0}, id = 5

{code}

> CBO : When filter is applied on dimension table PK/FK code path is not in 
> effect.
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-8280
>                 URL: https://issues.apache.org/jira/browse/HIVE-8280
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Harish Butani
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8280.1.patch, HIVE-8280.2.patch
>
>
> When a filter is applied on PK side joins don't qualify as PK/FK join.
> In getUniqueKeys when a filter is applied on the table the child is no  
> longer a table scan.
> {code}
>   public Set<BitSet> getUniqueKeys(ProjectRelBase rel, boolean ignoreNulls) {
>     RelNode child = rel.getChild();
>     if (!(child instanceof HiveTableScanRel)) {
>       Function<RelNode, Metadata> fn = RelMdUniqueKeys.SOURCE.apply(
>           rel.getClass(), BuiltInMetadata.UniqueKeys.class);
>       return ((BuiltInMetadata.UniqueKeys) fn.apply(rel))
>           .getUniqueKeys(ignoreNulls);
>     } 
> {code}
> Repro 
> {code}
> with ss as 
> (select 
>     ss_customer_sk, ss_item_sk, ss_ticket_number
> from
>     store_sales,
>     store
> where
>     s_store_sk = ss_store_sk
>         and s_market_id = 4), 
> sr as
> (select sr_customer_sk,sr_item_sk ,sr_ticket_number from store_returns, store 
> where s_store_sk = sr_store_sk and s_market_id=4) 
> select 
>     count(*)
> from
>     ss,
>     sr
> where
>     ss_customer_sk = sr_customer_sk
>         and ss_item_sk = sr_item_sk
>         and ss_ticket_number = sr_ticket_number;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to