Mostafa Mokhtar created HIVE-8752: ------------------------------------- Summary: Disjunction cardinality estimation has selectivity of 1 Key: HIVE-8752 URL: https://issues.apache.org/jira/browse/HIVE-8752 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Priority: Critical Fix For: 0.14.0
TPC-DS Q89 has the wrong join order. Store_sales should be joining with item first then date_dim. The issue is that the predicate on item shows a selectivity of 1 {code} ((i_category in ('Home','Books','Electronics') and i_class in ('wallpaper','parenting','musical') ) or (i_category in ('Shoes','Jewelry','Men') and i_class in ('womens','birdal','pants') )) {code} {code} HiveProjectRel(i_item_sk=[$0], i_brand=[$8], i_class=[$10], i_category=[$12]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4052 HiveFilterRel(condition=[OR(AND(in($12, 'Home', 'Books', 'Electronics'), in($10, 'wallpaper', 'parenting', 'musical')), AND(in($12, 'Shoes', 'Jewelry', 'Men'), in($10, 'womens', 'birdal', 'pants')))]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4050 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 3818 {code} Query {code} select * from( select i_category, i_class, i_brand, s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) over (partition by i_category, i_brand, s_store_name, s_company_name) avg_monthly_sales from item, store_sales, date_dim, store where store_sales.ss_item_sk = item.i_item_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and d_year in (2000) and ((i_category in ('Home','Books','Electronics') and i_class in ('wallpaper','parenting','musical') ) or (i_category in ('Shoes','Jewelry','Men') and i_class in ('womens','birdal','pants') )) group by i_category, i_class, i_brand, s_store_name, s_company_name, d_moy) tmp1 where case when (avg_monthly_sales <> 0) then (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) else null end > 0.1 order by sum_sales - avg_monthly_sales, s_store_name limit 100 {code} The result of the wrong join order is that the query runs in 335 seconds compared to 124 seconds with the correct join order. Removing the disjunction in the item filter produces the correct plan {code} i_category in ('Home','Books','Electronics') and i_class in ('wallpaper','parenting','musical') {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)