[ https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sushanth Sowmyan updated HIVE-10412: ------------------------------------ Fix Version/s: (was: 1.2.0) > CBO : Calculate join selectivity when computing HiveJoin cost > ------------------------------------------------------------- > > Key: HIVE-10412 > URL: https://issues.apache.org/jira/browse/HIVE-10412 > Project: Hive > Issue Type: Sub-task > Components: CBO > Reporter: Mostafa Mokhtar > Assignee: Laljo John Pullokkaran > > This is from TPC-DS Q7 > Because we don't compute the selectivity of sub-expression in a HiveJoin we > assume that selective and non-selective joins have the similar cost. > {code} > select i_item_id, > avg(ss_quantity) agg1, > avg(ss_list_price) agg2, > avg(ss_coupon_amt) agg3, > avg(ss_sales_price) agg4 > from store_sales, customer_demographics, item > where store_sales.ss_item_sk = item.i_item_sk and > store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and > cd_gender = 'F' and > cd_marital_status = 'W' and > cd_education_status = 'Primary' > group by i_item_id > order by i_item_id > limit 100 > {code} > Cardinality > {code} > item 462,000 > customer_demographics 1,920,800 > store_sales 82,510,879,939 > {code} > NDVs > {code} > item.i_item_sk 439501 > customer_demographics.cd_demo_sk 1835839 > store_sales.ss_cdemo_sk 1835839 > {code} > From the logs > {code} > 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: > HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], > cost=[{8.251089518344444E10 rows, 2.324083308641975E8 cpu, 275417.5666666666 > io}]) > HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], > ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]]) > HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], > cd_education_status=[$3]) > HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) > > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]]) > 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 > rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io} > 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 > rows, 2.13444462E11 cpu, 1.0720709999999998E7 io} > 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(78)) - MapJoin selected > 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: > HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not > available]) > HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], > cost=[{8.2511341939E10 rows, 2.13444462E11 cpu, 1.0720709999999998E7 io}]) > HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], > ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.store_sales]]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.item]]) > HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], > cd_education_status=[$3]) > HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) > > HiveTableScan(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]) > 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.251089518344444E10 > rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io} > 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel > (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.251089518344444E10 > rows, 2.324083308641975E8 cpu, 275417.5666666666 io} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)