[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

Laljo John Pullokkaran (JIRA) Fri, 08 May 2015 15:56:00 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Laljo John Pullokkaran updated HIVE-10107:
------------------------------------------
    Assignee:     (was: Prasanth Jayachandran)

> Union All : Vertex missing stats resulting in OOM and in-efficient plans
> ------------------------------------------------------------------------
>
>                 Key: HIVE-10107
>                 URL: https://issues.apache.org/jira/browse/HIVE-10107
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>
> Reducer Vertices sending data to a Union all edge are missing statistics and 
> as a result we either use very few reducers in the UNION ALL edge or decide 
> to broadcast the results of UNION ALL.
> Query
> {code}
> select 
>     count(*) rowcount
> from
>     (select 
>         ss_item_sk, ss_ticket_number, ss_store_sk
>     from
>         store_sales a, store_returns b
>     where
>         a.ss_item_sk = b.sr_item_sk
>             and a.ss_ticket_number = b.sr_ticket_number union all select 
>         ss_item_sk, ss_ticket_number, ss_store_sk
>     from
>         store_sales c, store_returns d
>     where
>         c.ss_item_sk = d.sr_item_sk
>             and c.ss_ticket_number = d.sr_ticket_number) t
> group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
> having rowcount > 100000000;
> {code}
> Plan snippet 
> {code}
>  Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>         Reducer 4 <- Union 3 (SIMPLE_EDGE)
>         Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>   Reducer 4
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
> (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (_col3 > 100000000) (type: boolean)
>                   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col3 (type: bigint)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 7
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 ss_item_sk (type: int), ss_ticket_number (type: int)
>                   1 sr_item_sk (type: int), sr_ticket_number (type: int)
>                 outputColumnNames: _col1, _col6, _col8, _col27, _col34
>                 Filter Operator
>                   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
> boolean)
>                   Select Operator
>                     expressions: _col1 (type: int), _col8 (type: int), _col6 
> (type: int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Group By Operator
>                       aggregations: count()
>                       keys: _col2 (type: int), _col0 (type: int), _col1 
> (type: int)
>                       mode: hash
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: 
> int), _col2 (type: int)
>                         sort order: +++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int), _col2 (type: int)
>                         value expressions: _col3 (type: bigint)
> {code}
> The full explain plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>         Reducer 4 <- Union 3 (SIMPLE_EDGE)
>         Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>       DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: (ss_item_sk is not null and ss_ticket_number is 
> not null) (type: boolean)
>                   Statistics: Num rows: 550076554 Data size: 47370018896 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (ss_item_sk is not null and ss_ticket_number 
> is not null) (type: boolean)
>                     Statistics: Num rows: 550076554 Data size: 6549093948 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: ss_item_sk (type: int), 
> ss_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: ss_item_sk (type: int), 
> ss_ticket_number (type: int)
>                       Statistics: Num rows: 550076554 Data size: 6549093948 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: ss_store_sk (type: int)
>         Map 5
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: (sr_item_sk is not null and sr_ticket_number is 
> not null) (type: boolean)
>                   Statistics: Num rows: 55578005 Data size: 4155315616 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number 
> is not null) (type: boolean)
>                     Statistics: Num rows: 55578005 Data size: 444624040 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>                       Statistics: Num rows: 55578005 Data size: 444624040 
> Basic stats: COMPLETE Column stats: COMPLETE
>         Map 6
>             Map Operator Tree:
>                 TableScan
>                   alias: c
>                   filterExpr: (ss_item_sk is not null and ss_ticket_number is 
> not null) (type: boolean)
>                   Statistics: Num rows: 550076554 Data size: 47370018896 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (ss_item_sk is not null and ss_ticket_number 
> is not null) (type: boolean)
>                     Statistics: Num rows: 550076554 Data size: 6549093948 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: ss_item_sk (type: int), 
> ss_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: ss_item_sk (type: int), 
> ss_ticket_number (type: int)
>                       Statistics: Num rows: 550076554 Data size: 6549093948 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: ss_store_sk (type: int)
>         Map 8
>             Map Operator Tree:
>                 TableScan
>                   alias: d
>                   filterExpr: (sr_item_sk is not null and sr_ticket_number is 
> not null) (type: boolean)
>                   Statistics: Num rows: 55578005 Data size: 4155315616 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number 
> is not null) (type: boolean)
>                     Statistics: Num rows: 55578005 Data size: 444624040 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>                       Statistics: Num rows: 55578005 Data size: 444624040 
> Basic stats: COMPLETE Column stats: COMPLETE
>         Reducer 2
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 ss_item_sk (type: int), ss_ticket_number (type: int)
>                   1 sr_item_sk (type: int), sr_ticket_number (type: int)
>                 outputColumnNames: _col1, _col6, _col8, _col27, _col34
>                 Filter Operator
>                   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
> boolean)
>                   Select Operator
>                     expressions: _col1 (type: int), _col8 (type: int), _col6 
> (type: int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Group By Operator
>                       aggregations: count()
>                       keys: _col2 (type: int), _col0 (type: int), _col1 
> (type: int)
>                       mode: hash
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: 
> int), _col2 (type: int)
>                         sort order: +++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int), _col2 (type: int)
>                         value expressions: _col3 (type: bigint)
>         Reducer 4
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
> (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 Filter Operator
>                   predicate: (_col3 > 100000000) (type: boolean)
>                   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col3 (type: bigint)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 7
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 ss_item_sk (type: int), ss_ticket_number (type: int)
>                   1 sr_item_sk (type: int), sr_ticket_number (type: int)
>                 outputColumnNames: _col1, _col6, _col8, _col27, _col34
>                 Filter Operator
>                   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
> boolean)
>                   Select Operator
>                     expressions: _col1 (type: int), _col8 (type: int), _col6 
> (type: int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Group By Operator
>                       aggregations: count()
>                       keys: _col2 (type: int), _col0 (type: int), _col1 
> (type: int)
>                       mode: hash
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: 
> int), _col2 (type: int)
>                         sort order: +++
>                         Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int), _col2 (type: int)
>                         value expressions: _col3 (type: bigint)
>         Union 3
>             Vertex: Union 3
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Also TPC-DS Q54 fails with OOM, this failure happens when we chose a 
> different plan.
> The OOM happens in  vertexName=Map 14
> {code}
> explain  
> with my_customers as (
>  select  c_customer_sk
>         , c_current_addr_sk
>  from   
>         ( select cs_sold_date_sk sold_date_sk,
>                  cs_bill_customer_sk customer_sk,
>                  cs_item_sk item_sk
>           from   catalog_sales
>           union all
>           select ws_sold_date_sk sold_date_sk,
>                  ws_bill_customer_sk customer_sk,
>                  ws_item_sk item_sk
>           from   web_sales
>          ) cs_or_ws_sales,
>          item,
>          date_dim,
>          customer
>  where   sold_date_sk = d_date_sk
>          and item_sk = i_item_sk
>          and i_category = 'Jewelry'
>          and i_class = 'football'
>          and c_customer_sk = cs_or_ws_sales.customer_sk
>          and d_moy = 3
>          and d_year = 2000
>          group by  c_customer_sk
>         , c_current_addr_sk
>  )
>  , my_revenue as (
>  select c_customer_sk,
>         sum(ss_ext_sales_price) as revenue
>  from   my_customers,
>         store_sales,
>         customer_address,
>         store,
>         date_dim
>  where  c_current_addr_sk = ca_address_sk
>         and ca_county = s_county
>         and ca_state = s_state
>         and ss_sold_date_sk = d_date_sk
>         and c_customer_sk = ss_customer_sk
>         and d_month_seq between (1203)
>                            and  (1205)
>  group by c_customer_sk
>  )
>  , segments as
>  (select cast((revenue/50) as int) as segment
>   from   my_revenue
>  )
>   select  segment, count(*) as num_customers, segment*50 as segment_base
>  from segments
>  group by segment
>  order by segment, num_customers
>  limit 100
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
>         Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
>         Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
>         Map 14 <- Union 11 (BROADCAST_EDGE)
>         Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
>         Map 8 <- Map 14 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>         Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
>         Reducer 9 <- Map 8 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   filterExpr: ss_customer_sk is not null (type: boolean)
>                   Statistics: Num rows: 82510879939 Data size: 6873789738208 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ss_customer_sk is not null (type: boolean)
>                     Statistics: Num rows: 80566020964 Data size: 951594129356 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ss_customer_sk (type: int), 
> ss_ext_sales_price (type: float), ss_sold_date_sk (type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 80566020964 Data size: 
> 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col2 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col0, _col1
>                         input vertices:
>                           1 Map 5
>                         Statistics: Num rows: 90081226648 Data size: 
> 720649813184 Basic stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           keys:
>                             0 _col0 (type: int)
>                             1 _col5 (type: int)
>                           outputColumnNames: _col1, _col10
>                           input vertices:
>                             1 Map 6
>                           Statistics: Num rows: 99089351460 Data size: 
> 792714811684 Basic stats: COMPLETE Column stats: NONE
>                           Select Operator
>                             expressions: _col10 (type: int), _col1 (type: 
> float)
>                             outputColumnNames: _col0, _col1
>                             Statistics: Num rows: 99089351460 Data size: 
> 792714811684 Basic stats: COMPLETE Column stats: NONE
>                             Group By Operator
>                               aggregations: sum(_col1)
>                               keys: _col0 (type: int)
>                               mode: hash
>                               outputColumnNames: _col0, _col1
>                               Statistics: Num rows: 99089351460 Data size: 
> 792714811684 Basic stats: COMPLETE Column stats: NONE
>                               Reduce Output Operator
>                                 key expressions: _col0 (type: int)
>                                 sort order: +
>                                 Map-reduce partition columns: _col0 (type: 
> int)
>                                 Statistics: Num rows: 99089351460 Data size: 
> 792714811684 Basic stats: COMPLETE Column stats: NONE
>                                 value expressions: _col1 (type: double)
>             Execution mode: vectorized
>         Map 10 
>             Map Operator Tree:
>                 TableScan
>                   alias: catalog_sales
>                   filterExpr: (cs_item_sk is not null and cs_bill_customer_sk 
> is not null) (type: boolean)
>                   Filter Operator
>                     predicate: (cs_item_sk is not null and 
> cs_bill_customer_sk is not null) (type: boolean)
>                     Select Operator
>                       expressions: cs_sold_date_sk (type: int), 
> cs_bill_customer_sk (type: int), cs_item_sk (type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col1, _col2
>                         input vertices:
>                           1 Map 13
>                         Reduce Output Operator
>                           key expressions: _col2 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col2 (type: int)
>                           value expressions: _col1 (type: int)
>             Execution mode: vectorized
>         Map 12 
>             Map Operator Tree:
>                 TableScan
>                   alias: web_sales
>                   filterExpr: (ws_item_sk is not null and ws_bill_customer_sk 
> is not null) (type: boolean)
>                   Filter Operator
>                     predicate: (ws_item_sk is not null and 
> ws_bill_customer_sk is not null) (type: boolean)
>                     Select Operator
>                       expressions: ws_sold_date_sk (type: int), 
> ws_bill_customer_sk (type: int), ws_item_sk (type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col1, _col2
>                         input vertices:
>                           1 Map 13
>                         Reduce Output Operator
>                           key expressions: _col2 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col2 (type: int)
>                           value expressions: _col1 (type: int)
>             Execution mode: vectorized
>         Map 13 
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: (((d_moy = 3) and (d_year = 2000)) and 
> d_date_sk is not null) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((d_moy = 3) and (d_year = 2000)) and 
> d_date_sk is not null) (type: boolean)
>                     Statistics: Num rows: 624 Data size: 7488 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 624 Data size: 2496 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 624 Data size: 2496 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col0 (type: int)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 624 Data size: 2496 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 312 Data size: 1248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target Input: catalog_sales
>                             Partition key expr: cs_sold_date_sk
>                             Statistics: Num rows: 312 Data size: 1248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Target column: cs_sold_date_sk
>                             Target Vertex: Map 10
>                       Select Operator
>                         expressions: _col0 (type: int)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 624 Data size: 2496 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 312 Data size: 1248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target Input: web_sales
>                             Partition key expr: ws_sold_date_sk
>                             Statistics: Num rows: 312 Data size: 1248 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Target column: ws_sold_date_sk
>                             Target Vertex: Map 12
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 624 Data size: 2496 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 14 
>             Map Operator Tree:
>                 TableScan
>                   alias: item
>                   filterExpr: (((i_category = 'Jewelry') and (i_class = 
> 'football')) and i_item_sk is not null) (type: boolean)
>                   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((i_category = 'Jewelry') and (i_class = 
> 'football')) and i_item_sk is not null) (type: boolean)
>                     Statistics: Num rows: 4200 Data size: 781200 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: i_item_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 4200 Data size: 16800 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col2 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col1
>                         input vertices:
>                           0 Union 11
>                         Statistics: Num rows: 79189328781 Data size: 0 Basic 
> stats: PARTIAL Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: _col1 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col1 (type: int)
>                           Statistics: Num rows: 79189328781 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>             Execution mode: vectorized
>         Map 5 
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and 
> d_date_sk is not null) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 81741831 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (d_month_seq BETWEEN 1203 AND 1205 and 
> d_date_sk is not null) (type: boolean)
>                     Statistics: Num rows: 36524 Data size: 292192 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col0 (type: int)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 36524 Data size: 146096 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 18262 Data size: 73048 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target Input: store_sales
>                             Partition key expr: ss_sold_date_sk
>                             Statistics: Num rows: 18262 Data size: 73048 
> Basic stats: COMPLETE Column stats: COMPLETE
>                             Target column: ss_sold_date_sk
>                             Target Vertex: Map 1
>             Execution mode: vectorized
>         Map 6 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_address
>                   filterExpr: ((ca_county is not null and ca_state is not 
> null) and ca_address_sk is not null) (type: boolean)
>                   Statistics: Num rows: 40000000 Data size: 40595195284 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ca_county is not null and ca_state is not 
> null) and ca_address_sk is not null) (type: boolean)
>                     Statistics: Num rows: 40000000 Data size: 7520000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ca_address_sk (type: int), ca_county 
> (type: string), ca_state (type: string)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 40000000 Data size: 7520000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col1 (type: string), _col2 (type: string)
>                           1 _col0 (type: string), _col1 (type: string)
>                         outputColumnNames: _col0
>                         input vertices:
>                           1 Map 7
>                         Statistics: Num rows: 778829 Data size: 3115316 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           keys:
>                             0 _col0 (type: int)
>                             1 _col1 (type: int)
>                           outputColumnNames: _col5
>                           input vertices:
>                             1 Reducer 9
>                           Statistics: Num rows: 47909545988 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>                           Reduce Output Operator
>                             key expressions: _col5 (type: int)
>                             sort order: +
>                             Map-reduce partition columns: _col5 (type: int)
>                             Statistics: Num rows: 47909545988 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>             Execution mode: vectorized
>         Map 7 
>             Map Operator Tree:
>                 TableScan
>                   alias: store
>                   filterExpr: (s_county is not null and s_state is not null) 
> (type: boolean)
>                   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (s_county is not null and s_state is not null) 
> (type: boolean)
>                     Statistics: Num rows: 1704 Data size: 313536 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: s_county (type: string), s_state (type: 
> string)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 1704 Data size: 313536 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: string), _col1 (type: 
> string)
>                         sort order: ++
>                         Map-reduce partition columns: _col0 (type: string), 
> _col1 (type: string)
>                         Statistics: Num rows: 1704 Data size: 313536 Basic 
> stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 8 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer
>                   filterExpr: (c_customer_sk is not null and 
> c_current_addr_sk is not null) (type: boolean)
>                   Statistics: Num rows: 80000000 Data size: 68801615852 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (c_customer_sk is not null and 
> c_current_addr_sk is not null) (type: boolean)
>                     Statistics: Num rows: 80000000 Data size: 640000000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: c_customer_sk (type: int), 
> c_current_addr_sk (type: int)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 80000000 Data size: 640000000 
> Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: int)
>                           1 _col1 (type: int)
>                         outputColumnNames: _col0, _col1
>                         input vertices:
>                           1 Map 14
>                         Statistics: Num rows: 87108263547 Data size: 0 Basic 
> stats: PARTIAL Column stats: NONE
>                         Group By Operator
>                           keys: _col0 (type: int), _col1 (type: int)
>                           mode: hash
>                           outputColumnNames: _col0, _col1
>                           Statistics: Num rows: 87108263547 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>                           Reduce Output Operator
>                             key expressions: _col0 (type: int), _col1 (type: 
> int)
>                             sort order: ++
>                             Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int)
>                             Statistics: Num rows: 87108263547 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>             Execution mode: vectorized
>         Reducer 2 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 keys: KEY._col0 (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 49544675730 Data size: 396357405842 
> Basic stats: COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: UDFToInteger((_col1 / 50.0)) (type: int)
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 49544675730 Data size: 396357405842 
> Basic stats: COMPLETE Column stats: NONE
>                   Group By Operator
>                     aggregations: count()
>                     keys: _col0 (type: int)
>                     mode: hash
>                     outputColumnNames: _col0, _col1
>                     Statistics: Num rows: 49544675730 Data size: 396357405842 
> Basic stats: COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 49544675730 Data size: 
> 396357405842 Basic stats: COMPLETE Column stats: NONE
>                       value expressions: _col1 (type: bigint)
>         Reducer 3 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 keys: KEY._col0 (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 24772337865 Data size: 198178702921 
> Basic stats: COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: int), _col1 (type: bigint), 
> (_col0 * 50) (type: int)
>                   outputColumnNames: _col0, _col1, _col2
>                   Statistics: Num rows: 24772337865 Data size: 198178702921 
> Basic stats: COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: _col0 (type: int), _col1 (type: bigint)
>                     sort order: ++
>                     Statistics: Num rows: 24772337865 Data size: 198178702921 
> Basic stats: COMPLETE Column stats: NONE
>                     TopN Hash Memory Usage: 0.04
>                     value expressions: _col2 (type: int)
>         Reducer 4 
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: int), 
> KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int)
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 24772337865 Data size: 198178702921 
> Basic stats: COMPLETE Column stats: NONE
>                 Limit
>                   Number of rows: 100
>                   Statistics: Num rows: 100 Data size: 800 Basic stats: 
> COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 100 Data size: 800 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 9 
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: int), KEY._col1 (type: int)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 43554131773 Data size: 0 Basic stats: 
> PARTIAL Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col1 (type: int)
>                   sort order: +
>                   Map-reduce partition columns: _col1 (type: int)
>                   Statistics: Num rows: 43554131773 Data size: 0 Basic stats: 
> PARTIAL Column stats: NONE
>                   value expressions: _col0 (type: int)
>         Union 11 
>             Vertex: Union 11
>   Stage: Stage-0
>     Fetch Operator
>       limit: 100
>       Processor Tree:
>         ListSink
> {code}
> In Map 14 Data size is 0 
> {code}
> p 14 
>             Map Operator Tree:
>                 TableScan
>                   alias: item
>                   filterExpr: (((i_category = 'Jewelry') and (i_class = 
> 'football')) and i_item_sk is not null) (type: boolean)
>                   Statistics: Num rows: 462000 Data size: 663862160 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((i_category = 'Jewelry') and (i_class = 
> 'football')) and i_item_sk is not null) (type: boolean)
>                     Statistics: Num rows: 4200 Data size: 781200 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: i_item_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 4200 Data size: 16800 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col2 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col1
>                         input vertices:
>                           0 Union 11
>                         Statistics: Num rows: 79189328781 Data size: 0 Basic 
> stats: PARTIAL Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: _col1 (type: int)
>                           sort order: +
>                           Map-reduce partition columns: _col1 (type: int)
>                           Statistics: Num rows: 79189328781 Data size: 0 
> Basic stats: PARTIAL Column stats: NONE
>             Execution mode: vectorized
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

Reply via email to