[jira] [Updated] (HIVE-9713) CBO : inefficient join order created for left join outer condition

Mostafa Mokhtar (JIRA) Tue, 17 Feb 2015 19:47:25 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mostafa Mokhtar updated HIVE-9713:
----------------------------------
    Description: 
For the query below which is a subset of TPC-DS Query 80, CBO joins 
catalog_sales with catalog_returns first although the CE of the join is 
relatively high.
catalog_sales should be joined with the selective dimension tables first.

{code}
select count(*)
from
          catalog_sales
         ,warehouse
         ,date_dim
         ,time_dim
         ,ship_mode
     where
            catalog_sales.cs_warehouse_sk =  warehouse.w_warehouse_sk
        and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
        and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
        and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
        and d_year = 2002
        and t_time between 49530 AND 49530+28800 
        and sm_carrier in ('DIAMOND','AIRBORNE')
     group by 
        w_warehouse_name
        ,w_warehouse_sq_ft
        ,w_city
        ,w_county
        ,w_state
        ,w_country
       ,d_year
{code}


Logical plan from CBO debug logs 
{code}
2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner 
(CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): 
rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
io}, id = 1395
  HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], 
agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 
rows, 0.0 cpu, 0.0 io}, id = 1393
    HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, 
coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = 
{8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
      HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 
1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
io}, id = 1508
        HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 
2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 
io}, id = 1506
          HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 
8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 
io}, id = 1504
            HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 
1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 
0.0 io}, id = 1418
              HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): 
rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 
0.0 io}, id = 1413
                HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], 
cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], 
cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
                  HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): 
rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
                HiveProject(cr_item_sk=[$2], cr_order_number=[$16], 
cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
                  HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): 
rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
              HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
                HiveFilter(condition=[between(false, $2, 
CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
                  HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount 
= 73049.0, cumulative cost = {0}, id = 1138
            HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): 
rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
              HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount 
= 11718.0, cumulative cost = {0}, id = 1139
          HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 
16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
            HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
              HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 
48000.0, cumulative cost = {0}, id = 1140
        HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
          HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
            HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 
450.0, cumulative cost = {0}, id = 1141
{code}

Explain plan 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 2 (BROADCAST_EDGE)
        Map 3 <- Map 1 (BROADCAST_EDGE)
        Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
(BROADCAST_EDGE)
        Reducer 5 <- Map 4 (SIMPLE_EDGE)
      DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: catalog_sales
                  filterExpr: (((cs_sold_date_sk is not null and 
cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is 
not null) (type: boolean)
                  Statistics: Num rows: 286549727 Data size: 65825832570 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((cs_sold_date_sk is not null and 
cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is 
not null) (type: boolean)
                    Statistics: Num rows: 285112475 Data size: 7974560516 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cs_sold_date_sk (type: int), 
cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: 
int), cs_order_number (type: int), cs_ext_sales_price (type: float), 
cs_net_profit (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6
                      Statistics: Num rows: 285112475 Data size: 7974560516 
Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Left Outer Join0 to 1
                        keys:
                          0 _col2 (type: int), _col4 (type: int)
                          1 _col0 (type: int), _col1 (type: int)
                        outputColumnNames: _col0, _col1, _col2, _col3, _col5, 
_col6, _col9, _col10
                        input vertices:
                          1 Map 2
                        Statistics: Num rows: 2911 Data size: 93152 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: int)
                          Statistics: Num rows: 2911 Data size: 93152 Basic 
stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col1 (type: int), _col2 (type: 
int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: 
float), _col10 (type: float)
            Execution mode: vectorized
        Map 2
            Map Operator Tree:
                TableScan
                  alias: catalog_returns
                  filterExpr: cr_item_sk is not null (type: boolean)
                  Statistics: Num rows: 28798881 Data size: 5764329494 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: cr_item_sk is not null (type: boolean)
                    Statistics: Num rows: 28798881 Data size: 456171072 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cr_item_sk (type: int), cr_order_number 
(type: int), cr_return_amount (type: float), cr_net_loss (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 28798881 Data size: 456171072 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int)
                        sort order: ++
                        Map-reduce partition columns: _col0 (type: int), _col1 
(type: int)
                        Statistics: Num rows: 28798881 Data size: 456171072 
Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col2 (type: float), _col3 (type: 
float)
            Execution mode: vectorized
        Map 3
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
d_date_sk is not null) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
d_date_sk is not null) (type: boolean)
                    Statistics: Num rows: 36524 Data size: 3579352 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 36524 Data size: 146096 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1, _col2, _col3, _col5, _col6, 
_col9, _col10
                        input vertices:
                          0 Map 1
                        Statistics: Num rows: 1456 Data size: 40768 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col1 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col1 (type: int)
                          Statistics: Num rows: 1456 Data size: 40768 Basic 
stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col2 (type: int), _col3 (type: 
int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 
(type: float)
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: catalog_page
                  filterExpr: cp_catalog_page_sk is not null (type: boolean)
                  Statistics: Num rows: 11718 Data size: 5400282 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: cp_catalog_page_sk is not null (type: boolean)
                    Statistics: Num rows: 11718 Data size: 1218672 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cp_catalog_page_sk (type: int), 
cp_catalog_page_id (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 11718 Data size: 1218672 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col1 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col2, _col3, _col5, _col6, _col9, 
_col10, _col14
                        input vertices:
                          0 Map 3
                        Statistics: Num rows: 1456 Data size: 180544 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col2 (type: int)
                            1 _col0 (type: int)
                          outputColumnNames: _col3, _col5, _col6, _col9, 
_col10, _col14
                          input vertices:
                            1 Map 6
                          Statistics: Num rows: 486 Data size: 58320 Basic 
stats: COMPLETE Column stats: COMPLETE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            keys:
                              0 _col3 (type: int)
                              1 _col0 (type: int)
                            outputColumnNames: _col5, _col6, _col9, _col10, 
_col14
                            input vertices:
                              1 Map 7
                            Statistics: Num rows: 243 Data size: 28188 Basic 
stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col14 (type: string), _col5 (type: 
float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) (type: 
float)
                              outputColumnNames: _col0, _col1, _col2, _col3
                              Statistics: Num rows: 243 Data size: 28188 Basic 
stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                aggregations: sum(_col1), sum(_col2), sum(_col3)
                                keys: _col0 (type: string)
                                mode: hash
                                outputColumnNames: _col0, _col1, _col2, _col3
                                Statistics: Num rows: 121 Data size: 15004 
Basic stats: COMPLETE Column stats: COMPLETE
                                Reduce Output Operator
                                  key expressions: _col0 (type: string)
                                  sort order: +
                                  Map-reduce partition columns: _col0 (type: 
string)
                                  Statistics: Num rows: 121 Data size: 15004 
Basic stats: COMPLETE Column stats: COMPLETE
                                  value expressions: _col1 (type: double), 
_col2 (type: double), _col3 (type: double)
            Execution mode: vectorized
        Map 6
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: ((i_current_price > 50.0) and i_item_sk is not 
null) (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((i_current_price > 50.0) and i_item_sk is not 
null) (type: boolean)
                    Statistics: Num rows: 16000 Data size: 127832 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 16000 Data size: 64000 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 16000 Data size: 64000 Basic 
stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 7
            Map Operator Tree:
                TableScan
                  alias: promotion
                  filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not null) 
(type: boolean)
                  Statistics: Num rows: 450 Data size: 530848 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((p_channel_tv = 'N') and p_promo_sk is not 
null) (type: boolean)
                    Statistics: Num rows: 225 Data size: 20025 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: p_promo_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 225 Data size: 900 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 225 Data size: 900 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 5
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0), sum(VALUE._col1), 
sum(VALUE._col2)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 121 Data size: 15004 Basic stats: 
COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 121 Data size: 15004 Basic stats: 
COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

  was:
For the query below which is a subset of TPC-DS Query 66, CBO joins 
catalog_sales with catalog_returns first although the CE of the join is 
relatively high.
catalog_sales should be joined with the selective dimension tables first.

{code}
select count(*)
from
          catalog_sales
         ,warehouse
         ,date_dim
         ,time_dim
         ,ship_mode
     where
            catalog_sales.cs_warehouse_sk =  warehouse.w_warehouse_sk
        and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
        and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
        and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
        and d_year = 2002
        and t_time between 49530 AND 49530+28800 
        and sm_carrier in ('DIAMOND','AIRBORNE')
     group by 
        w_warehouse_name
        ,w_warehouse_sq_ft
        ,w_city
        ,w_county
        ,w_state
        ,w_country
       ,d_year
{code}


Logical plan from CBO debug logs 
{code}
2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner 
(CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): 
rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
io}, id = 1395
  HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], 
agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 
rows, 0.0 cpu, 0.0 io}, id = 1393
    HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, 
coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = 
{8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
      HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 
1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
io}, id = 1508
        HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 
2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 
io}, id = 1506
          HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 
8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 
io}, id = 1504
            HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 
1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 
0.0 io}, id = 1418
              HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): 
rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 
0.0 io}, id = 1413
                HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], 
cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], 
cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
                  HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): 
rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
                HiveProject(cr_item_sk=[$2], cr_order_number=[$16], 
cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
                  HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): 
rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
              HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
                HiveFilter(condition=[between(false, $2, 
CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
                  HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount 
= 73049.0, cumulative cost = {0}, id = 1138
            HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): 
rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
              HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount 
= 11718.0, cumulative cost = {0}, id = 1139
          HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 
16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
            HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
              HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 
48000.0, cumulative cost = {0}, id = 1140
        HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
          HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative 
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
            HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 
450.0, cumulative cost = {0}, id = 1141
{code}

Explain plan 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 2 (BROADCAST_EDGE)
        Map 3 <- Map 1 (BROADCAST_EDGE)
        Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
(BROADCAST_EDGE)
        Reducer 5 <- Map 4 (SIMPLE_EDGE)
      DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: catalog_sales
                  filterExpr: (((cs_sold_date_sk is not null and 
cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is 
not null) (type: boolean)
                  Statistics: Num rows: 286549727 Data size: 65825832570 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((cs_sold_date_sk is not null and 
cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is 
not null) (type: boolean)
                    Statistics: Num rows: 285112475 Data size: 7974560516 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cs_sold_date_sk (type: int), 
cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: 
int), cs_order_number (type: int), cs_ext_sales_price (type: float), 
cs_net_profit (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6
                      Statistics: Num rows: 285112475 Data size: 7974560516 
Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Left Outer Join0 to 1
                        keys:
                          0 _col2 (type: int), _col4 (type: int)
                          1 _col0 (type: int), _col1 (type: int)
                        outputColumnNames: _col0, _col1, _col2, _col3, _col5, 
_col6, _col9, _col10
                        input vertices:
                          1 Map 2
                        Statistics: Num rows: 2911 Data size: 93152 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col0 (type: int)
                          Statistics: Num rows: 2911 Data size: 93152 Basic 
stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col1 (type: int), _col2 (type: 
int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: 
float), _col10 (type: float)
            Execution mode: vectorized
        Map 2
            Map Operator Tree:
                TableScan
                  alias: catalog_returns
                  filterExpr: cr_item_sk is not null (type: boolean)
                  Statistics: Num rows: 28798881 Data size: 5764329494 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: cr_item_sk is not null (type: boolean)
                    Statistics: Num rows: 28798881 Data size: 456171072 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cr_item_sk (type: int), cr_order_number 
(type: int), cr_return_amount (type: float), cr_net_loss (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 28798881 Data size: 456171072 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int)
                        sort order: ++
                        Map-reduce partition columns: _col0 (type: int), _col1 
(type: int)
                        Statistics: Num rows: 28798881 Data size: 456171072 
Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col2 (type: float), _col3 (type: 
float)
            Execution mode: vectorized
        Map 3
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
d_date_sk is not null) (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
d_date_sk is not null) (type: boolean)
                    Statistics: Num rows: 36524 Data size: 3579352 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 36524 Data size: 146096 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1, _col2, _col3, _col5, _col6, 
_col9, _col10
                        input vertices:
                          0 Map 1
                        Statistics: Num rows: 1456 Data size: 40768 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col1 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col1 (type: int)
                          Statistics: Num rows: 1456 Data size: 40768 Basic 
stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col2 (type: int), _col3 (type: 
int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 
(type: float)
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: catalog_page
                  filterExpr: cp_catalog_page_sk is not null (type: boolean)
                  Statistics: Num rows: 11718 Data size: 5400282 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: cp_catalog_page_sk is not null (type: boolean)
                    Statistics: Num rows: 11718 Data size: 1218672 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cp_catalog_page_sk (type: int), 
cp_catalog_page_id (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 11718 Data size: 1218672 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col1 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col2, _col3, _col5, _col6, _col9, 
_col10, _col14
                        input vertices:
                          0 Map 3
                        Statistics: Num rows: 1456 Data size: 180544 Basic 
stats: COMPLETE Column stats: COMPLETE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col2 (type: int)
                            1 _col0 (type: int)
                          outputColumnNames: _col3, _col5, _col6, _col9, 
_col10, _col14
                          input vertices:
                            1 Map 6
                          Statistics: Num rows: 486 Data size: 58320 Basic 
stats: COMPLETE Column stats: COMPLETE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            keys:
                              0 _col3 (type: int)
                              1 _col0 (type: int)
                            outputColumnNames: _col5, _col6, _col9, _col10, 
_col14
                            input vertices:
                              1 Map 7
                            Statistics: Num rows: 243 Data size: 28188 Basic 
stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col14 (type: string), _col5 (type: 
float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) (type: 
float)
                              outputColumnNames: _col0, _col1, _col2, _col3
                              Statistics: Num rows: 243 Data size: 28188 Basic 
stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                aggregations: sum(_col1), sum(_col2), sum(_col3)
                                keys: _col0 (type: string)
                                mode: hash
                                outputColumnNames: _col0, _col1, _col2, _col3
                                Statistics: Num rows: 121 Data size: 15004 
Basic stats: COMPLETE Column stats: COMPLETE
                                Reduce Output Operator
                                  key expressions: _col0 (type: string)
                                  sort order: +
                                  Map-reduce partition columns: _col0 (type: 
string)
                                  Statistics: Num rows: 121 Data size: 15004 
Basic stats: COMPLETE Column stats: COMPLETE
                                  value expressions: _col1 (type: double), 
_col2 (type: double), _col3 (type: double)
            Execution mode: vectorized
        Map 6
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: ((i_current_price > 50.0) and i_item_sk is not 
null) (type: boolean)
                  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((i_current_price > 50.0) and i_item_sk is not 
null) (type: boolean)
                    Statistics: Num rows: 16000 Data size: 127832 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 16000 Data size: 64000 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 16000 Data size: 64000 Basic 
stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 7
            Map Operator Tree:
                TableScan
                  alias: promotion
                  filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not null) 
(type: boolean)
                  Statistics: Num rows: 450 Data size: 530848 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((p_channel_tv = 'N') and p_promo_sk is not 
null) (type: boolean)
                    Statistics: Num rows: 225 Data size: 20025 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: p_promo_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 225 Data size: 900 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 225 Data size: 900 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 5
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0), sum(VALUE._col1), 
sum(VALUE._col2)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 121 Data size: 15004 Basic stats: 
COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 121 Data size: 15004 Basic stats: 
COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}


> CBO : inefficient join order created for left join outer condition
> ------------------------------------------------------------------
>
>                 Key: HIVE-9713
>                 URL: https://issues.apache.org/jira/browse/HIVE-9713
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Laljo John Pullokkaran
>             Fix For: 1.2.0
>
>
> For the query below which is a subset of TPC-DS Query 80, CBO joins 
> catalog_sales with catalog_returns first although the CE of the join is 
> relatively high.
> catalog_sales should be joined with the selective dimension tables first.
> {code}
> select count(*)
> from
>           catalog_sales
>          ,warehouse
>          ,date_dim
>          ,time_dim
>        ,ship_mode
>      where
>             catalog_sales.cs_warehouse_sk =  warehouse.w_warehouse_sk
>         and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>         and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
>       and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
>         and d_year = 2002
>       and t_time between 49530 AND 49530+28800 
>       and sm_carrier in ('DIAMOND','AIRBORNE')
>      group by 
>         w_warehouse_name
>       ,w_warehouse_sq_ft
>       ,w_city
>       ,w_county
>       ,w_state
>       ,w_country
>        ,d_year
> {code}
> Logical plan from CBO debug logs 
> {code}
> 2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): 
> rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
> io}, id = 1395
>   HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], 
> agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 
> rows, 0.0 cpu, 0.0 io}, id = 1393
>     HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, 
> coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = 
> {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
>       HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 
> 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 
> 0.0 io}, id = 1508
>         HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 
> 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 
> 0.0 io}, id = 1506
>           HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 
> 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 
> 0.0 io}, id = 1504
>             HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 
> 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 
> 0.0 io}, id = 1418
>               HiveJoin(condition=[AND(=($2, $7), =($4, $8))], 
> joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = 
> {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413
>                 HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], 
> cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], 
> cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
>                   HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): 
> rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
>                 HiveProject(cr_item_sk=[$2], cr_order_number=[$16], 
> cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
>                   HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): 
> rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
>               HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
>                 HiveFilter(condition=[between(false, $2, 
> CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
>                   HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): 
> rowcount = 73049.0, cumulative cost = {0}, id = 1138
>             HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): 
> rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
>               HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): 
> rowcount = 11718.0, cumulative cost = {0}, id = 1139
>           HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 
> 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
>             HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
>               HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 
> 48000.0, cumulative cost = {0}, id = 1140
>         HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
>           HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative 
> cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
>             HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 
> 450.0, cumulative cost = {0}, id = 1141
> {code}
> Explain plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 2 (BROADCAST_EDGE)
>         Map 3 <- Map 1 (BROADCAST_EDGE)
>         Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
> (BROADCAST_EDGE)
>         Reducer 5 <- Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: catalog_sales
>                   filterExpr: (((cs_sold_date_sk is not null and 
> cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk 
> is not null) (type: boolean)
>                   Statistics: Num rows: 286549727 Data size: 65825832570 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((cs_sold_date_sk is not null and 
> cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk 
> is not null) (type: boolean)
>                     Statistics: Num rows: 285112475 Data size: 7974560516 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cs_sold_date_sk (type: int), 
> cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: 
> int), cs_order_number (type: int), cs_ext_sales_price (type: float), 
> cs_net_profit (type: float)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6
>                       Statistics: Num rows: 285112475 Data size: 7974560516 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Left Outer Join0 to 1
>                         keys:
>                           0 _col2 (type: int), _col4 (type: int)
>                           1 _col0 (type: int), _col1 (type: int)
>                         outputColumnNames: _col0, _col1, _col2, _col3, _col5, 
> _col6, _col9, _col10
>                         input vertices:
>                           1 Map 2
>                         Statistics: Num rows: 2911 Data size: 93152 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col0 (type: int)
>                           Statistics: Num rows: 2911 Data size: 93152 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col1 (type: int), _col2 (type: 
> int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 
> (type: float), _col10 (type: float)
>             Execution mode: vectorized
>         Map 2
>             Map Operator Tree:
>                 TableScan
>                   alias: catalog_returns
>                   filterExpr: cr_item_sk is not null (type: boolean)
>                   Statistics: Num rows: 28798881 Data size: 5764329494 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: cr_item_sk is not null (type: boolean)
>                     Statistics: Num rows: 28798881 Data size: 456171072 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cr_item_sk (type: int), cr_order_number 
> (type: int), cr_return_amount (type: float), cr_net_loss (type: float)
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Statistics: Num rows: 28798881 Data size: 456171072 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: int)
>                         sort order: ++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int)
>                         Statistics: Num rows: 28798881 Data size: 456171072 
> Basic stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col2 (type: float), _col3 (type: 
> float)
>             Execution mode: vectorized
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
> d_date_sk is not null) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and 
> d_date_sk is not null) (type: boolean)
>                     Statistics: Num rows: 36524 Data size: 3579352 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col1, _col2, _col3, _col5, _col6, 
> _col9, _col10
>                         input vertices:
>                           0 Map 1
>                         Statistics: Num rows: 1456 Data size: 40768 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col1 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col1 (type: int)
>                           Statistics: Num rows: 1456 Data size: 40768 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col2 (type: int), _col3 (type: 
> int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 
> (type: float)
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: catalog_page
>                   filterExpr: cp_catalog_page_sk is not null (type: boolean)
>                   Statistics: Num rows: 11718 Data size: 5400282 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: cp_catalog_page_sk is not null (type: boolean)
>                     Statistics: Num rows: 11718 Data size: 1218672 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: cp_catalog_page_sk (type: int), 
> cp_catalog_page_id (type: string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 11718 Data size: 1218672 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col1 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col2, _col3, _col5, _col6, _col9, 
> _col10, _col14
>                         input vertices:
>                           0 Map 3
>                         Statistics: Num rows: 1456 Data size: 180544 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           keys:
>                             0 _col2 (type: int)
>                             1 _col0 (type: int)
>                           outputColumnNames: _col3, _col5, _col6, _col9, 
> _col10, _col14
>                           input vertices:
>                             1 Map 6
>                           Statistics: Num rows: 486 Data size: 58320 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Map Join Operator
>                             condition map:
>                                  Inner Join 0 to 1
>                             keys:
>                               0 _col3 (type: int)
>                               1 _col0 (type: int)
>                             outputColumnNames: _col5, _col6, _col9, _col10, 
> _col14
>                             input vertices:
>                               1 Map 7
>                             Statistics: Num rows: 243 Data size: 28188 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Select Operator
>                               expressions: _col14 (type: string), _col5 
> (type: float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) 
> (type: float)
>                               outputColumnNames: _col0, _col1, _col2, _col3
>                               Statistics: Num rows: 243 Data size: 28188 
> Basic stats: COMPLETE Column stats: COMPLETE
>                               Group By Operator
>                                 aggregations: sum(_col1), sum(_col2), 
> sum(_col3)
>                                 keys: _col0 (type: string)
>                                 mode: hash
>                                 outputColumnNames: _col0, _col1, _col2, _col3
>                                 Statistics: Num rows: 121 Data size: 15004 
> Basic stats: COMPLETE Column stats: COMPLETE
>                                 Reduce Output Operator
>                                   key expressions: _col0 (type: string)
>                                   sort order: +
>                                   Map-reduce partition columns: _col0 (type: 
> string)
>                                   Statistics: Num rows: 121 Data size: 15004 
> Basic stats: COMPLETE Column stats: COMPLETE
>                                   value expressions: _col1 (type: double), 
> _col2 (type: double), _col3 (type: double)
>             Execution mode: vectorized
>         Map 6
>             Map Operator Tree:
>                 TableScan
>                   alias: item
>                   filterExpr: ((i_current_price > 50.0) and i_item_sk is not 
> null) (type: boolean)
>                   Statistics: Num rows: 48000 Data size: 68732712 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((i_current_price > 50.0) and i_item_sk is not 
> null) (type: boolean)
>                     Statistics: Num rows: 16000 Data size: 127832 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: i_item_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 16000 Data size: 64000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 16000 Data size: 64000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 7
>             Map Operator Tree:
>                 TableScan
>                   alias: promotion
>                   filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not 
> null) (type: boolean)
>                   Statistics: Num rows: 450 Data size: 530848 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((p_channel_tv = 'N') and p_promo_sk is not 
> null) (type: boolean)
>                     Statistics: Num rows: 225 Data size: 20025 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: p_promo_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 225 Data size: 900 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 225 Data size: 900 Basic stats: 
> COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Reducer 5
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0), sum(VALUE._col1), 
> sum(VALUE._col2)
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 121 Data size: 15004 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 121 Data size: 15004 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   table:
>                       input format: org.apache.hadoop.mapred.TextInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9713) CBO : inefficient join order created for left join outer condition

Reply via email to