[
https://issues.apache.org/jira/browse/HIVE-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mostafa Mokhtar updated HIVE-9604:
----------------------------------
Component/s: CBO
Affects Version/s: 0.14.0
> CBO : Presence of hybrid join condition sets of join order optimizations
> ------------------------------------------------------------------------
>
> Key: HIVE-9604
> URL: https://issues.apache.org/jira/browse/HIVE-9604
> Project: Hive
> Issue Type: New Feature
> Components: CBO
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Laljo John Pullokkaran
>
> When a query has a join between two tables on hybrid join condition
> (conjunction + disjunction) the resulting join order from CBO is suboptimal.
> Re-writing the query results in 9x performance improvement.
> This was observed in several TPC-DS queries like Q72 and Q64.
> The culprit join conditions are :
> {code}
> catalog_sales.cs_item_sk = inventory.inv_item_sk
> and inv_quantity_on_hand < cs_quantity
> {code}
> This is a simplified version of Q72.
> {code}
> select count(*) total_cnt
> from catalog_sales
> join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
> join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
> join item on (item.i_item_sk = catalog_sales.cs_item_sk)
> join customer_demographics on (catalog_sales.cs_bill_cdemo_sk =
> customer_demographics.cd_demo_sk)
> join household_demographics on (catalog_sales.cs_bill_hdemo_sk =
> household_demographics.hd_demo_sk)
> join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
> join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
> join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
> where d1.d_week_seq = d2.d_week_seq
> and inv_quantity_on_hand < cs_quantity
> and d3.d_date > d1.d_date + 5
> and hd_buy_potential = '1001-5000'
> and d1.d_year = 2001
> and hd_buy_potential = '1001-5000'
> and cd_marital_status = 'M'
> and d1.d_year = 2001;
> {code}
> If the inventory table is moved down in the join order a more efficient plan
> is generated
> Modified query
> {code}
> select count(*) total_cnt
> from catalog_sales
> join item on (item.i_item_sk = catalog_sales.cs_item_sk)
> join customer_demographics on (catalog_sales.cs_bill_cdemo_sk =
> customer_demographics.cd_demo_sk)
> join household_demographics on (catalog_sales.cs_bill_hdemo_sk =
> household_demographics.hd_demo_sk)
> join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
> join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
> join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
> join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
> join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
> where d1.d_week_seq = d2.d_week_seq
> and inv_quantity_on_hand < cs_quantity
> and d3.d_date > d1.d_date + 5
> and hd_buy_potential = '1001-5000'
> and d1.d_year = 2001
> and hd_buy_potential = '1001-5000'
> and cd_marital_status = 'M'
> and d1.d_year = 2001;
> {code}
> Plan with base query notice how catalog_sales joins with inventory first
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 3 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 11
> (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 13 (BROADCAST_EDGE), Map 2
> (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8
> (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
> Reducer 4 <- Map 3 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> DagName: mmokhtar_20141015151414_a08eae06-7250-4833-9e1d-8e58eb69780e:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: (d_date_sk is not null and d_week_seq is not
> null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_date_sk is not null and d_week_seq is not
> null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_week_seq (type:
> int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int),
> _col1 (type: int)
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 0 Basic
> stats: PARTIAL Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: inventory
> Partition key expr: inv_date_sk
> Statistics: Num rows: 73049 Data size: 0 Basic
> stats: PARTIAL Column stats: COMPLETE
> Target column: inv_date_sk
> Target Vertex: Map 7
> Execution mode: vectorized
> Map 10
> Map Operator Tree:
> TableScan
> alias: catalog_returns
> Statistics: Num rows: 28798881 Data size: 2942039156 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cr_item_sk (type: int), cr_order_number
> (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 28798881 Data size: 230391048 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int), _col1
> (type: int)
> Statistics: Num rows: 28798881 Data size: 230391048
> Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 11
> Map Operator Tree:
> TableScan
> alias: customer_demographics
> filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is
> not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 718379200 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((cd_marital_status = 'M') and cd_demo_sk is
> not null) (type: boolean)
> Statistics: Num rows: 274400 Data size: 24421600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cd_demo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 274400 Data size: 1097600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 274400 Data size: 1097600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 12
> Map Operator Tree:
> TableScan
> alias: promotion
> Statistics: Num rows: 450 Data size: 530848 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: p_promo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 450 Data size: 1800 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 450 Data size: 1800 Basic stats:
> COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 13
> Map Operator Tree:
> TableScan
> alias: household_demographics
> filterExpr: ((hd_buy_potential = '1001-5000') and
> hd_demo_sk is not null) (type: boolean)
> Statistics: Num rows: 7200 Data size: 770400 Basic stats:
> COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((hd_buy_potential = '1001-5000') and
> hd_demo_sk is not null) (type: boolean)
> Statistics: Num rows: 1440 Data size: 138240 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: hd_demo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1440 Data size: 5760 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1440 Data size: 5760 Basic
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_date (type:
> string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: catalog_sales
> filterExpr: (((cs_item_sk is not null and cs_bill_hdemo_sk
> is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not
> null) (type: boolean)
> Statistics: Num rows: 286549727 Data size: 37743959324
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((cs_item_sk is not null and cs_bill_hdemo_sk
> is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not
> null) (type: boolean)
> Statistics: Num rows: 284396955 Data size: 7948828532
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cs_ship_date_sk (type: int),
> cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type:
> int), cs_promo_sk (type: int), cs_order_number (type: int), cs_quantity
> (type: int), cs_sold_date_sk (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7
> Statistics: Num rows: 284396955 Data size: 9086416352
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5}
> {_col6} {_col7}
> 1 {_col1} {_col2} {_col3}
> keys:
> 0 _col3 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col9, _col10, _col11
> input vertices:
> 1 Map 7
> Statistics: Num rows: 275157677926 Data size:
> 8805045693632 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col10 < _col6) (type: boolean)
> Statistics: Num rows: 91719225975 Data size:
> 2935015231200 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int), _col1 (type:
> int), _col11 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type:
> int), _col5 (type: int), _col7 (type: int), _col9 (type: int)
> outputColumnNames: _col0, _col1, _col11, _col2,
> _col3, _col4, _col5, _col7, _col9
> Statistics: Num rows: 91719225975 Data size:
> 2201261423400 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0} {_col1} {_col2} {_col3} {_col4}
> {_col5} {_col9} {_col11}
> 1 {_col1} {_col2}
> keys:
> 0 _col7 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3,
> _col4, _col5, _col9, _col11, _col13, _col14
> input vertices:
> 1 Map 8
> Statistics: Num rows: 102398861876 Data size:
> 12083065701368 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0} {_col1} {_col2} {_col3} {_col4}
> {_col5} {_col9} {_col13} {_col14}
> 1
> keys:
> 0 _col11 (type: int), _col14 (type: int)
> 1 _col0 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1, _col2,
> _col3, _col4, _col5, _col9, _col13, _col14
> input vertices:
> 1 Map 1
> Statistics: Num rows: 8705458 Data size:
> 992422212 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0} {_col1} {_col3} {_col4} {_col5}
> {_col9} {_col13} {_col14}
> 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col3,
> _col4, _col5, _col9, _col13, _col14
> input vertices:
> 1 Map 13
> Statistics: Num rows: 6739709 Data size:
> 768326826 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int), _col1
> (type: int), _col13 (type: string), _col14 (type: int), _col3 (type: int),
> _col4 (type: int), _col5 (type: int), _col9 (type: int)
> outputColumnNames: _col0, _col1, _col13,
> _col14, _col3, _col4, _col5, _col9
> Statistics: Num rows: 6739709 Data size:
> 768326826 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0
> 1 {_col0} {_col3} {_col4} {_col5}
> {_col9} {_col13} {_col14}
> keys:
> 0 _col0 (type: int)
> 1 _col1 (type: int)
> outputColumnNames: _col2, _col5, _col6,
> _col7, _col11, _col15, _col16
> input vertices:
> 0 Map 11
> Statistics: Num rows: 7051609 Data
> size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col11 (type: int),
> _col15 (type: string), _col16 (type: int), _col2 (type: int), _col5 (type:
> int), _col6 (type: int), _col7 (type: int)
> outputColumnNames: _col11, _col15,
> _col16, _col2, _col5, _col6, _col7
> Statistics: Num rows: 7051609 Data
> size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col1}
> 1 {_col5} {_col6} {_col7}
> {_col11} {_col15} {_col16}
> keys:
> 0 _col0 (type: int)
> 1 _col2 (type: int)
> outputColumnNames: _col1, _col7,
> _col8, _col9, _col13, _col17, _col18
> input vertices:
> 0 Map 2
> Statistics: Num rows: 7884543 Data
> size: 1639984944 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (UDFToDouble(_col1) >
> (UDFToDouble(_col17) + UDFToDouble(5))) (type: boolean)
> Statistics: Num rows: 2628181
> Data size: 546661648 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col13 (type:
> int), _col18 (type: int), _col7 (type: int), _col8 (type: int), _col9 (type:
> int)
> outputColumnNames: _col13,
> _col18, _col7, _col8, _col9
> Statistics: Num rows: 2628181
> Data size: 52563620 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col1}
> 1 {_col7} {_col8} {_col9}
> {_col13} {_col18}
> keys:
> 0 _col0 (type: int)
> 1 _col7 (type: int)
> outputColumnNames: _col1,
> _col9, _col10, _col11, _col15, _col20
> input vertices:
> 0 Map 6
> Statistics: Num rows:
> 29222304 Data size: 5961350016 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col1} {_col9}
> {_col10} {_col11} {_col20}
> 1 {_col1}
> keys:
> 0 _col15 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1,
> _col9, _col10, _col11, _col20, _col27
> input vertices:
> 1 Map 9
> Statistics: Num rows:
> 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col27
> (type: string), _col1 (type: string), _col20 (type: int), _col9 (type: int),
> _col10 (type: int), _col11 (type: int)
> outputColumnNames:
> _col13, _col15, _col22, _col3, _col4, _col5
> Statistics: Num rows:
> 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Right Outer Join0
> to 1
> condition expressions:
> 0 {_col0}
> 1 {_col3} {_col5}
> {_col13} {_col15} {_col22}
> keys:
> 0 _col0 (type: int)
> 1 _col4 (type: int)
> outputColumnNames:
> _col0, _col4, _col6, _col14, _col16, _col23
> input vertices:
> 0 Map 12
> Statistics: Num rows:
> 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0
> (type: int), _col14 (type: string), _col16 (type: string), _col23 (type:
> int), _col4 (type: int), _col6 (type: int)
> outputColumnNames:
> _col0, _col14, _col16, _col23, _col4, _col6
> Statistics: Num rows:
> 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Right Outer
> Join0 to 1
> condition
> expressions:
> 0
> 1 {_col0}
> {_col14} {_col16} {_col23}
> keys:
> 0 _col0 (type:
> int), _col1 (type: int)
> 1 _col4 (type:
> int), _col6 (type: int)
> outputColumnNames:
> _col2, _col16, _col18, _col25
> input vertices:
> 0 Map 10
> Statistics: Num
> rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats:
> COMPLETE
> Select Operator
> expressions:
> _col18 (type: string), _col16 (type: string), _col25 (type: int), CASE WHEN
> (_col2 is null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col2 is not
> null) THEN (1) ELSE (0) END (type: int)
>
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> Statistics: Num
> rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats:
> COMPLETE
> Group By Operator
> aggregations:
> count(_col3), count(_col4), count()
> keys: _col0
> (type: string), _col1 (type: string), _col2 (type: int)
> mode: hash
>
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
> Statistics: Num
> rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output
> Operator
> key
> expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int)
> sort order:
> +++
> Map-reduce
> partition columns: _col0 (type: string), _col1 (type: string), _col2 (type:
> int)
> Statistics:
> Num rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats:
> COMPLETE
> value
> expressions: _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint)
> Execution mode: vectorized
> Map 6
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 48000 Data size: 68732712 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 48000 Data size: 9024000 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int), i_item_desc (type:
> string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 48000 Data size: 9024000 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 48000 Data size: 9024000 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string)
> Execution mode: vectorized
> Map 7
> Map Operator Tree:
> TableScan
> alias: inventory
> filterExpr: (inv_item_sk is not null and inv_warehouse_sk
> is not null) (type: boolean)
> Statistics: Num rows: 37584000 Data size: 443485104 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (inv_item_sk is not null and inv_warehouse_sk
> is not null) (type: boolean)
> Statistics: Num rows: 37584000 Data size: 443485104 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: inv_item_sk (type: int), inv_warehouse_sk
> (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 37584000 Data size: 593821104
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 37584000 Data size: 593821104
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: int), _col2 (type:
> int), _col3 (type: int)
> Execution mode: vectorized
> Map 8
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: (((d_year = 2001) and d_date_sk is not null)
> and d_week_seq is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((d_year = 2001) and d_date_sk is not null)
> and d_week_seq is not null) (type: boolean)
> Statistics: Num rows: 652 Data size: 69112 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_date (type:
> string), d_week_seq (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 652 Data size: 66504 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 652 Data size: 66504 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type:
> int)
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: catalog_sales
> Partition key expr: cs_sold_date_sk
> Statistics: Num rows: 652 Data size: 0 Basic
> stats: PARTIAL Column stats: COMPLETE
> Target column: cs_sold_date_sk
> Target Vertex: Map 3
> Execution mode: vectorized
> Map 9
> Map Operator Tree:
> TableScan
> alias: warehouse
> filterExpr: w_warehouse_sk is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 6166 Basic stats:
> COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: w_warehouse_sk is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 618 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: w_warehouse_sk (type: int),
> w_warehouse_name (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 618 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 6 Data size: 618 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string)
> Execution mode: vectorized
> Reducer 4
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0), count(VALUE._col1),
> count(VALUE._col2)
> keys: KEY._col0 (type: string), KEY._col1 (type: string),
> KEY._col2 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
> Statistics: Num rows: 44400 Data size: 2131200 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: string), _col1 (type: string),
> _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type:
> bigint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
> Statistics: Num rows: 44400 Data size: 2131200 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col5 (type: bigint), _col0 (type:
> string), _col1 (type: string), _col2 (type: int)
> sort order: -+++
> Statistics: Num rows: 44400 Data size: 2131200 Basic
> stats: COMPLETE Column stats: COMPLETE
> TopN Hash Memory Usage: 0.04
> value expressions: _col3 (type: bigint), _col4 (type:
> bigint)
> Execution mode: vectorized
> Reducer 5
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey1 (type: string),
> KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: int),
> VALUE._col0 (type: bigint), VALUE._col1 (type: bigint), KEY.reducesinkkey0
> (type: bigint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
> Statistics: Num rows: 44400 Data size: 2131200 Basic stats:
> COMPLETE Column stats: COMPLETE
> Limit
> Number of rows: 100
> Statistics: Num rows: 100 Data size: 4800 Basic stats:
> COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 100 Data size: 4800 Basic stats:
> COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Stage: Stage-0
> Fetch Operator
> limit: 100
> Processor Tree:
> ListSink
> Time taken: 16.504 seconds, Fetched: 463 row(s)
> {code}
> Plan with modified query
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 7
> (BROADCAST_EDGE)
> Map 7 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6
> (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> DagName: mmokhtar_20150206180606_2f891814-bf41-4888-ab80-dddf4425205f:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: (d_date_sk is not null and d_week_seq is not
> null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_date_sk is not null and d_week_seq is not
> null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_week_seq (type:
> int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int),
> _col1 (type: int)
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 584392 Basic
> stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: inventory
> Partition key expr: inv_date_sk
> Statistics: Num rows: 73049 Data size: 584392
> Basic stats: COMPLETE Column stats: COMPLETE
> Target column: inv_date_sk
> Target Vertex: Map 2
> Map 10
> Map Operator Tree:
> TableScan
> alias: warehouse
> filterExpr: w_warehouse_sk is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 6166 Basic stats:
> COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: w_warehouse_sk is not null (type: boolean)
> Statistics: Num rows: 6 Data size: 24 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: w_warehouse_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 6 Data size: 24 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 6 Data size: 24 Basic stats:
> COMPLETE Column stats: COMPLETE
> Map 2
> Map Operator Tree:
> TableScan
> alias: inventory
> filterExpr: (inv_item_sk is not null and inv_warehouse_sk
> is not null) (type: boolean)
> Statistics: Num rows: 37584000 Data size: 443485104 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (inv_item_sk is not null and inv_warehouse_sk
> is not null) (type: boolean)
> Statistics: Num rows: 37584000 Data size: 593821104 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: inv_item_sk (type: int), inv_warehouse_sk
> (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 37584000 Data size: 593821104
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col3 (type: int)
> outputColumnNames: _col1, _col2, _col3, _col8, _col17
> input vertices:
> 1 Map 7
> Statistics: Num rows: 937245073 Data size:
> 14995921168 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col2 < _col8) (type: boolean)
> Statistics: Num rows: 312415024 Data size:
> 4998640384 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col17 (type: int), _col1 (type:
> int), _col3 (type: int)
> outputColumnNames: _col13, _col18, _col20
> Statistics: Num rows: 312415024 Data size:
> 2499320192 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int), _col1 (type: int)
> 1 _col20 (type: int), _col13 (type: int)
> outputColumnNames: _col20
> input vertices:
> 0 Map 1
> Statistics: Num rows: 26560 Data size: 106240
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col20 (type: int)
> 1 _col0 (type: int)
> input vertices:
> 1 Map 10
> Statistics: Num rows: 26560 Data size: 0
> Basic stats: PARTIAL Column stats: COMPLETE
> Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 8
> Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint)
> Map 4
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_date (type:
> string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 73049 Data size: 7158802 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string)
> Map 5
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 48000 Data size: 68732712 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 48000 Data size: 192000 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 48000 Data size: 192000 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 48000 Data size: 192000 Basic
> stats: COMPLETE Column stats: COMPLETE
> Map 6
> Map Operator Tree:
> TableScan
> alias: customer_demographics
> filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is
> not null) (type: boolean)
> Statistics: Num rows: 1920800 Data size: 718379200 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((cd_marital_status = 'M') and cd_demo_sk is
> not null) (type: boolean)
> Statistics: Num rows: 274400 Data size: 24421600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cd_demo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 274400 Data size: 1097600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 274400 Data size: 1097600 Basic
> stats: COMPLETE Column stats: COMPLETE
> Map 7
> Map Operator Tree:
> TableScan
> alias: catalog_sales
> filterExpr: (((cs_bill_hdemo_sk is not null and
> cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk
> is not null) (type: boolean)
> Statistics: Num rows: 286549727 Data size: 37743959324
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((cs_bill_hdemo_sk is not null and
> cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk
> is not null) (type: boolean)
> Statistics: Num rows: 284396955 Data size: 6814100200
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cs_ship_date_sk (type: int),
> cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type:
> int), cs_quantity (type: int), cs_sold_date_sk (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5
> Statistics: Num rows: 284396955 Data size: 6814100200
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col5 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col7, _col8
> input vertices:
> 1 Map 8
> Statistics: Num rows: 101715202 Data size:
> 12002393836 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col3, _col4,
> _col7, _col8
> input vertices:
> 1 Map 9
> Statistics: Num rows: 20343040 Data size:
> 2319106560 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col1 (type: int)
> outputColumnNames: _col2, _col5, _col6, _col9,
> _col10
> input vertices:
> 0 Map 6
> Statistics: Num rows: 2906149 Data size: 34873788
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col10 (type: int), _col2 (type:
> int), _col5 (type: int), _col6 (type: int), _col9 (type: string)
> outputColumnNames: _col10, _col2, _col5, _col6,
> _col9
> Statistics: Num rows: 2906149 Data size:
> 34873788 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col5 (type: int)
> outputColumnNames: _col3, _col6, _col7,
> _col10, _col11
> input vertices:
> 0 Map 5
> Statistics: Num rows: 2906149 Data size:
> 34873788 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col10 (type: string), _col11
> (type: int), _col3 (type: int), _col6 (type: int), _col7 (type: int)
> outputColumnNames: _col10, _col11, _col3,
> _col6, _col7
> Statistics: Num rows: 2906149 Data size:
> 34873788 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col3 (type: int)
> outputColumnNames: _col1, _col8, _col9,
> _col12, _col13
> input vertices:
> 0 Map 4
> Statistics: Num rows: 2906149 Data size:
> 296427198 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (UDFToDouble(_col1) >
> (UDFToDouble(_col12) + 5.0)) (type: boolean)
> Statistics: Num rows: 968716 Data size:
> 98809032 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col13 (type: int),
> _col8 (type: int), _col9 (type: int)
> outputColumnNames: _col13, _col3,
> _col4
> Statistics: Num rows: 968716 Data
> size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col3 (type: int)
> sort order: +
> Map-reduce partition columns: _col3
> (type: int)
> Statistics: Num rows: 968716 Data
> size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col4 (type:
> int), _col13 (type: int)
> Map 8
> Map Operator Tree:
> TableScan
> alias: d1
> filterExpr: (((d_year = 2001) and d_date_sk is not null)
> and d_week_seq is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((d_year = 2001) and d_date_sk is not null)
> and d_week_seq is not null) (type: boolean)
> Statistics: Num rows: 652 Data size: 69112 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int), d_date (type:
> string), d_week_seq (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 652 Data size: 66504 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 652 Data size: 66504 Basic
> stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: string), _col2 (type:
> int)
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 66504 Basic
> stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 66504 Basic
> stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: catalog_sales
> Partition key expr: cs_sold_date_sk
> Statistics: Num rows: 652 Data size: 66504 Basic
> stats: COMPLETE Column stats: COMPLETE
> Target column: cs_sold_date_sk
> Target Vertex: Map 7
> Map 9
> Map Operator Tree:
> TableScan
> alias: household_demographics
> filterExpr: ((hd_buy_potential = '1001-5000') and
> hd_demo_sk is not null) (type: boolean)
> Statistics: Num rows: 7200 Data size: 770400 Basic stats:
> COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((hd_buy_potential = '1001-5000') and
> hd_demo_sk is not null) (type: boolean)
> Statistics: Num rows: 1440 Data size: 138240 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: hd_demo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 1440 Data size: 5760 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1440 Data size: 5760 Basic
> stats: COMPLETE Column stats: COMPLETE
> Reducer 3
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)