Mostafa Mokhtar created HIVE-8767: ------------------------------------- Summary: CBO : Join on inequality results in in-correct join order Key: HIVE-8767 URL: https://issues.apache.org/jira/browse/HIVE-8767 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.15.0
Queries with hybrid joins (inner join and in-equality join) produce inefficient join order. CBO joins the two tables involved in the in-equality join first then the remaining joins are considered. The problem with that selectivity of the other joins is not taken into consideration. Queries that are affected by this are Q64 and Q72 from TPC-DS Logical plan for Q72 {code} 2014-11-06 14:13:12,169 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering: HiveSortRel(fetch=[100]): rowcount = 139827.8175849229, cumulative cost = {7.195499709572942E13 rows, 279655.6351698458 cpu, 0.0 io}, id = 2037 HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 139827.8175849229, cumulative cost = {7.195497058847592E13 rows, 139827.8175849229 cpu, 0.0 io}, id = 2035 HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 139827.8175849229, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2033 HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.8175849229, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2031 HiveProjectRel($f0=[$13], $f1=[$11], $f2=[$20]): rowcount = 106451.860966184, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2029 HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($19):DOUBLE, CAST(5):DOUBLE))]): rowcount = 106451.860966184, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2027 HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], inv_item_sk=[$8], inv_warehouse_sk=[$9], inv_quantity_on_hand=[$10], inv_date_sk=[$11], w_warehouse_sk=[$24], w_warehouse_name=[$25], i_item_sk=[$0], i_item_desc=[$1], cd_demo_sk=[$20], cd_marital_status=[$21], hd_demo_sk=[$18], hd_buy_potential=[$19], d_date_sk=[$12], d_date=[$13], d_week_seq=[$14], d_year=[$15], d_date_sk0=[$16], d_week_seq0=[$17], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.582898552, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2451 HiveJoinRel(condition=[=($24, $9)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2449 HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494376183984E13 rows, 0.0 cpu, 0.0 io}, id = 2447 HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494336943527E13 rows, 0.0 cpu, 0.0 io}, id = 2445 HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1997 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 1645 HiveJoinRel(condition=[=($1, $18)], joinType=[inner]): rowcount = 319355.582898552, cumulative cost = {7.195494258807969E13 rows, 0.0 cpu, 0.0 io}, id = 2443 HiveJoinRel(condition=[=($2, $16)], joinType=[inner]): rowcount = 2235489.080289864, cumulative cost = {7.195494007819061E13 rows, 0.0 cpu, 0.0 io}, id = 2441 HiveJoinRel(condition=[AND(=($9, $14), =($12, $15))], joinType=[inner]): rowcount = 5.588722700724658E7, cumulative cost = {7.195488419067561E13 rows, 0.0 cpu, 0.0 io}, id = 2439 HiveJoinRel(condition=[=($5, $10)], joinType=[inner]): rowcount = 5.732184228903609E9, cumulative cost = {7.19491519333977E13 rows, 0.0 cpu, 0.0 io}, id = 2080 HiveFilterRel(condition=[<($8, $4)]): rowcount = 7.190451896736688E13, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 1991 HiveProjectRel(cs_ship_date_sk=[$0], cs_bill_cdemo_sk=[$1], cs_bill_hdemo_sk=[$2], cs_item_sk=[$3], cs_quantity=[$4], cs_sold_date_sk=[$5], inv_item_sk=[$6], inv_warehouse_sk=[$7], inv_quantity_on_hand=[$8], inv_date_sk=[$9]): rowcount = 2.1571355690210062E14, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2072 HiveJoinRel(condition=[=($3, $6)], joinType=[inner]): rowcount = 2.1571355690210062E14, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2067 HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1925 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 1642 HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1928 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 1644 HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2015 HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2013 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640 HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2019 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640 HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2009 HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2007 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 1639 HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2003 HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2001 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 1646 HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2023 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1640 HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1993 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 1647 {code} Logical plan for Q72 if the join order in the query is changed {code} 2014-11-06 14:14:54,469 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering: HiveSortRel(fetch=[100]): rowcount = 139827.81758492283, cumulative cost = {4.472416182114529E10 rows, 279655.63516984566 cpu, 0.0 io}, id = 2860 HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 139827.81758492283, cumulative cost = {4.469765456764418E10 rows, 139827.81758492283 cpu, 0.0 io}, id = 2858 HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 139827.81758492283, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2856 HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.81758492283, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2854 HiveProjectRel($f0=[$21], $f1=[$19], $f2=[$8]): rowcount = 106451.86096618396, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2852 HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($7):DOUBLE, CAST(5):DOUBLE))]): rowcount = 106451.86096618396, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2850 HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], d_date_sk=[$8], d_date=[$9], d_week_seq=[$10], d_year=[$11], cd_demo_sk=[$12], cd_marital_status=[$13], hd_demo_sk=[$14], hd_buy_potential=[$15], inv_item_sk=[$16], inv_warehouse_sk=[$17], inv_quantity_on_hand=[$18], inv_date_sk=[$19], w_warehouse_sk=[$24], w_warehouse_name=[$25], i_item_sk=[$0], i_item_desc=[$1], d_date_sk0=[$20], d_week_seq0=[$21], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.5828985519, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3081 HiveJoinRel(condition=[=($24, $17)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3079 HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.467082793156017E10 rows, 0.0 cpu, 0.0 io}, id = 3077 HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.467043552697727E10 rows, 0.0 cpu, 0.0 io}, id = 3075 HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2838 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 2462 HiveJoinRel(condition=[AND(=($17, $18), =($8, $19))], joinType=[inner]): rowcount = 319355.5828985519, cumulative cost = {4.466965417139437E10 rows, 0.0 cpu, 0.0 io}, id = 2991 HiveFilterRel(condition=[<($16, $4)]): rowcount = 3.2755338450877763E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2832 HiveProjectRel(cs_ship_date_sk=[$6], cs_bill_cdemo_sk=[$7], cs_bill_hdemo_sk=[$8], cs_item_sk=[$9], cs_quantity=[$10], cs_sold_date_sk=[$11], d_date_sk=[$12], d_date=[$13], d_week_seq=[$14], d_year=[$15], cd_demo_sk=[$4], cd_marital_status=[$5], hd_demo_sk=[$16], hd_buy_potential=[$17], inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2987 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2985 HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2828 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 2461 HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id = 2983 HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2818 HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2816 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 2463 HiveJoinRel(condition=[=($2, $10)], joinType=[inner]): rowcount = 137133.63847257654, cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id = 2981 HiveJoinRel(condition=[=($5, $6)], joinType=[inner]): rowcount = 3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id = 2901 HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2808 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 2459 HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2812 HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2810 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457 HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2824 HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2822 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 2456 HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2842 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457 HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2846 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 2457 HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2834 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 2464 {code} Logical plan after replacing "inv_quantity_on_hand < cs_quantity" with inv_quantity_on_hand = 528 {code} 2014-11-06 14:18:17,149 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631)) - Plan After Join Reordering: HiveSortRel(fetch=[100]): rowcount = 863.6667057213618, cumulative cost = {4.301228282853069E10 rows, 1727.3334114427237 cpu, 0.0 io}, id = 4382 HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC], dir3=[ASC]): rowcount = 863.6667057213618, cumulative cost = {4.301218939793791E10 rows, 863.6667057213618 cpu, 0.0 io}, id = 4380 HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]): rowcount = 863.6667057213618, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4378 HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 863.6667057213618, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4376 HiveProjectRel($f0=[$12], $f1=[$10], $f2=[$19]): rowcount = 462.4898665205019, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4374 HiveFilterRel(condition=[>(CAST($24):DOUBLE, +(CAST($18):DOUBLE, CAST(5):DOUBLE))]): rowcount = 462.4898665205019, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4372 HiveProjectRel(cs_ship_date_sk=[$12], cs_bill_cdemo_sk=[$13], cs_bill_hdemo_sk=[$14], cs_item_sk=[$15], cs_sold_date_sk=[$16], inv_item_sk=[$2], inv_warehouse_sk=[$3], inv_quantity_on_hand=[$4], inv_date_sk=[$5], w_warehouse_sk=[$6], w_warehouse_name=[$7], i_item_sk=[$8], i_item_desc=[$9], cd_demo_sk=[$10], cd_marital_status=[$11], hd_demo_sk=[$21], hd_buy_potential=[$22], d_date_sk=[$17], d_date=[$18], d_week_seq=[$19], d_year=[$20], d_date_sk0=[$23], d_week_seq0=[$24], d_date_sk1=[$0], d_date0=[$1]): rowcount = 1387.4695995615057, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4853 HiveJoinRel(condition=[=($12, $0)], joinType=[inner]): rowcount = 1387.4695995615057, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4851 HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4368 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975 HiveJoinRel(condition=[AND(=($3, $21), =($17, $22))], joinType=[inner]): rowcount = 1387.4695995615057, cumulative cost = {4.3012021530875534E10 rows, 0.0 cpu, 0.0 io}, id = 4849 HiveJoinRel(condition=[=($13, $0)], joinType=[inner]): rowcount = 142308.57000041206, cumulative cost = {4.3011806173305534E10 rows, 0.0 cpu, 0.0 io}, id = 4847 HiveJoinRel(condition=[=($4, $1)], joinType=[inner]): rowcount = 2355798.8422575975, cumulative cost = {2355825.8422575975 rows, 0.0 cpu, 0.0 io}, id = 4460 HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 2355798.8422575975, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4334 HiveFilterRel(condition=[=($2, 528)]): rowcount = 2355798.8422575975, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4332 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]): rowcount = 1.627857E9, cumulative cost = {0}, id = 3979 HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4338 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount = 27.0, cumulative cost = {0}, id = 3982 HiveJoinRel(condition=[=($0, $7)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.30094307839435E10 rows, 0.0 cpu, 0.0 io}, id = 4845 HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4342 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount = 462000.0, cumulative cost = {0}, id = 3980 HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount = 19590.519781796647, cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id = 4843 HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4348 HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4346 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]): rowcount = 1920800.0, cumulative cost = {0}, id = 3981 HiveJoinRel(condition=[=($2, $9)], joinType=[inner]): rowcount = 137133.63847257654, cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id = 4841 HiveJoinRel(condition=[=($4, $5)], joinType=[inner]): rowcount = 3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id = 4428 HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4330 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 3977 HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4360 HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4358 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975 HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4354 HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]): rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4352 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]): rowcount = 7200.0, cumulative cost = {0}, id = 3974 HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4364 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 3975 {code} Q72 {code} select i_item_desc ,w_warehouse_name ,d1.d_week_seq ,count(case when p_promo_sk is null then 1 else 0 end) no_promo ,count(case when p_promo_sk is not null then 1 else 0 end) promo ,count(*) total_cnt from catalog_sales join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk) join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk) join item on (item.i_item_sk = catalog_sales.cs_item_sk) join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk) join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk) join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk) join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk) join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk) left outer join promotion on (catalog_sales.cs_promo_sk=promotion.p_promo_sk) left outer join catalog_returns on (catalog_returns.cr_item_sk = catalog_sales.cs_item_sk and catalog_returns.cr_order_number = catalog_sales.cs_order_number) where d1.d_week_seq = d2.d_week_seq and inv_quantity_on_hand < cs_quantity and d3.d_date > d1.d_date + 5 and hd_buy_potential = '1001-5000' and d1.d_year = 2001 and hd_buy_potential = '1001-5000' and cd_marital_status = 'M' and d1.d_year = 2001 group by i_item_desc,w_warehouse_name,d1.d_week_seq order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq limit 100 {code} Q64 {code} select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status <> cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)