[ https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laljo John Pullokkaran updated HIVE-10107: ------------------------------------------ Assignee: (was: Prasanth Jayachandran) > Union All : Vertex missing stats resulting in OOM and in-efficient plans > ------------------------------------------------------------------------ > > Key: HIVE-10107 > URL: https://issues.apache.org/jira/browse/HIVE-10107 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > > Reducer Vertices sending data to a Union all edge are missing statistics and > as a result we either use very few reducers in the UNION ALL edge or decide > to broadcast the results of UNION ALL. > Query > {code} > select > count(*) rowcount > from > (select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales a, store_returns b > where > a.ss_item_sk = b.sr_item_sk > and a.ss_ticket_number = b.sr_ticket_number union all select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales c, store_returns d > where > c.ss_item_sk = d.sr_item_sk > and c.ss_ticket_number = d.sr_ticket_number) t > group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number > having rowcount > 100000000; > {code} > Plan snippet > {code} > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 > (CONTAINS) > Reducer 4 <- Union 3 (SIMPLE_EDGE) > Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 > (CONTAINS) > Reducer 4 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 > (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (_col3 > 100000000) (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > Select Operator > expressions: _col3 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Reducer 7 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 ss_item_sk (type: int), ss_ticket_number (type: int) > 1 sr_item_sk (type: int), sr_ticket_number (type: int) > outputColumnNames: _col1, _col6, _col8, _col27, _col34 > Filter Operator > predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: > boolean) > Select Operator > expressions: _col1 (type: int), _col8 (type: int), _col6 > (type: int) > outputColumnNames: _col0, _col1, _col2 > Group By Operator > aggregations: count() > keys: _col2 (type: int), _col0 (type: int), _col1 > (type: int) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: > int), _col2 (type: int) > sort order: +++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int), _col2 (type: int) > value expressions: _col3 (type: bigint) > {code} > The full explain plan > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 > (CONTAINS) > Reducer 4 <- Union 3 (SIMPLE_EDGE) > Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 > (CONTAINS) > DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: a > filterExpr: (ss_item_sk is not null and ss_ticket_number is > not null) (type: boolean) > Statistics: Num rows: 550076554 Data size: 47370018896 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (ss_item_sk is not null and ss_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 550076554 Data size: 6549093948 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: ss_item_sk (type: int), > ss_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: ss_item_sk (type: int), > ss_ticket_number (type: int) > Statistics: Num rows: 550076554 Data size: 6549093948 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: ss_store_sk (type: int) > Map 5 > Map Operator Tree: > TableScan > alias: b > filterExpr: (sr_item_sk is not null and sr_ticket_number is > not null) (type: boolean) > Statistics: Num rows: 55578005 Data size: 4155315616 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 55578005 Data size: 444624040 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 55578005 Data size: 444624040 > Basic stats: COMPLETE Column stats: COMPLETE > Map 6 > Map Operator Tree: > TableScan > alias: c > filterExpr: (ss_item_sk is not null and ss_ticket_number is > not null) (type: boolean) > Statistics: Num rows: 550076554 Data size: 47370018896 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (ss_item_sk is not null and ss_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 550076554 Data size: 6549093948 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: ss_item_sk (type: int), > ss_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: ss_item_sk (type: int), > ss_ticket_number (type: int) > Statistics: Num rows: 550076554 Data size: 6549093948 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: ss_store_sk (type: int) > Map 8 > Map Operator Tree: > TableScan > alias: d > filterExpr: (sr_item_sk is not null and sr_ticket_number is > not null) (type: boolean) > Statistics: Num rows: 55578005 Data size: 4155315616 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 55578005 Data size: 444624040 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 55578005 Data size: 444624040 > Basic stats: COMPLETE Column stats: COMPLETE > Reducer 2 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 ss_item_sk (type: int), ss_ticket_number (type: int) > 1 sr_item_sk (type: int), sr_ticket_number (type: int) > outputColumnNames: _col1, _col6, _col8, _col27, _col34 > Filter Operator > predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: > boolean) > Select Operator > expressions: _col1 (type: int), _col8 (type: int), _col6 > (type: int) > outputColumnNames: _col0, _col1, _col2 > Group By Operator > aggregations: count() > keys: _col2 (type: int), _col0 (type: int), _col1 > (type: int) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: > int), _col2 (type: int) > sort order: +++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int), _col2 (type: int) > value expressions: _col3 (type: bigint) > Reducer 4 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 > (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (_col3 > 100000000) (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > Select Operator > expressions: _col3 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Reducer 7 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 ss_item_sk (type: int), ss_ticket_number (type: int) > 1 sr_item_sk (type: int), sr_ticket_number (type: int) > outputColumnNames: _col1, _col6, _col8, _col27, _col34 > Filter Operator > predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: > boolean) > Select Operator > expressions: _col1 (type: int), _col8 (type: int), _col6 > (type: int) > outputColumnNames: _col0, _col1, _col2 > Group By Operator > aggregations: count() > keys: _col2 (type: int), _col0 (type: int), _col1 > (type: int) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: > int), _col2 (type: int) > sort order: +++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int), _col2 (type: int) > value expressions: _col3 (type: bigint) > Union 3 > Vertex: Union 3 > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > Also TPC-DS Q54 fails with OOM, this failure happens when we chose a > different plan. > The OOM happens in vertexName=Map 14 > {code} > explain > with my_customers as ( > select c_customer_sk > , c_current_addr_sk > from > ( select cs_sold_date_sk sold_date_sk, > cs_bill_customer_sk customer_sk, > cs_item_sk item_sk > from catalog_sales > union all > select ws_sold_date_sk sold_date_sk, > ws_bill_customer_sk customer_sk, > ws_item_sk item_sk > from web_sales > ) cs_or_ws_sales, > item, > date_dim, > customer > where sold_date_sk = d_date_sk > and item_sk = i_item_sk > and i_category = 'Jewelry' > and i_class = 'football' > and c_customer_sk = cs_or_ws_sales.customer_sk > and d_moy = 3 > and d_year = 2000 > group by c_customer_sk > , c_current_addr_sk > ) > , my_revenue as ( > select c_customer_sk, > sum(ss_ext_sales_price) as revenue > from my_customers, > store_sales, > customer_address, > store, > date_dim > where c_current_addr_sk = ca_address_sk > and ca_county = s_county > and ca_state = s_state > and ss_sold_date_sk = d_date_sk > and c_customer_sk = ss_customer_sk > and d_month_seq between (1203) > and (1205) > group by c_customer_sk > ) > , segments as > (select cast((revenue/50) as int) as segment > from my_revenue > ) > select segment, count(*) as num_customers, segment*50 as segment_base > from segments > group by segment > order by segment, num_customers > limit 100 > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) > Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS) > Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS) > Map 14 <- Union 11 (BROADCAST_EDGE) > Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE) > Map 8 <- Map 14 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (SIMPLE_EDGE) > Reducer 9 <- Map 8 (SIMPLE_EDGE) > DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: ss_customer_sk is not null (type: boolean) > Statistics: Num rows: 82510879939 Data size: 6873789738208 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ss_customer_sk is not null (type: boolean) > Statistics: Num rows: 80566020964 Data size: 951594129356 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_customer_sk (type: int), > ss_ext_sales_price (type: float), ss_sold_date_sk (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 80566020964 Data size: > 951594129356 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1 > input vertices: > 1 Map 5 > Statistics: Num rows: 90081226648 Data size: > 720649813184 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col5 (type: int) > outputColumnNames: _col1, _col10 > input vertices: > 1 Map 6 > Statistics: Num rows: 99089351460 Data size: > 792714811684 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col10 (type: int), _col1 (type: > float) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 99089351460 Data size: > 792714811684 Basic stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: sum(_col1) > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 99089351460 Data size: > 792714811684 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: > int) > Statistics: Num rows: 99089351460 Data size: > 792714811684 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: double) > Execution mode: vectorized > Map 10 > Map Operator Tree: > TableScan > alias: catalog_sales > filterExpr: (cs_item_sk is not null and cs_bill_customer_sk > is not null) (type: boolean) > Filter Operator > predicate: (cs_item_sk is not null and > cs_bill_customer_sk is not null) (type: boolean) > Select Operator > expressions: cs_sold_date_sk (type: int), > cs_bill_customer_sk (type: int), cs_item_sk (type: int) > outputColumnNames: _col0, _col1, _col2 > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col1, _col2 > input vertices: > 1 Map 13 > Reduce Output Operator > key expressions: _col2 (type: int) > sort order: + > Map-reduce partition columns: _col2 (type: int) > value expressions: _col1 (type: int) > Execution mode: vectorized > Map 12 > Map Operator Tree: > TableScan > alias: web_sales > filterExpr: (ws_item_sk is not null and ws_bill_customer_sk > is not null) (type: boolean) > Filter Operator > predicate: (ws_item_sk is not null and > ws_bill_customer_sk is not null) (type: boolean) > Select Operator > expressions: ws_sold_date_sk (type: int), > ws_bill_customer_sk (type: int), ws_item_sk (type: int) > outputColumnNames: _col0, _col1, _col2 > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col1, _col2 > input vertices: > 1 Map 13 > Reduce Output Operator > key expressions: _col2 (type: int) > sort order: + > Map-reduce partition columns: _col2 (type: int) > value expressions: _col1 (type: int) > Execution mode: vectorized > Map 13 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: (((d_moy = 3) and (d_year = 2000)) and > d_date_sk is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((d_moy = 3) and (d_year = 2000)) and > d_date_sk is not null) (type: boolean) > Statistics: Num rows: 624 Data size: 7488 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 624 Data size: 2496 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 624 Data size: 2496 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 624 Data size: 2496 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 312 Data size: 1248 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: catalog_sales > Partition key expr: cs_sold_date_sk > Statistics: Num rows: 312 Data size: 1248 Basic > stats: COMPLETE Column stats: COMPLETE > Target column: cs_sold_date_sk > Target Vertex: Map 10 > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 624 Data size: 2496 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 312 Data size: 1248 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: web_sales > Partition key expr: ws_sold_date_sk > Statistics: Num rows: 312 Data size: 1248 Basic > stats: COMPLETE Column stats: COMPLETE > Target column: ws_sold_date_sk > Target Vertex: Map 12 > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 624 Data size: 2496 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 14 > Map Operator Tree: > TableScan > alias: item > filterExpr: (((i_category = 'Jewelry') and (i_class = > 'football')) and i_item_sk is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((i_category = 'Jewelry') and (i_class = > 'football')) and i_item_sk is not null) (type: boolean) > Statistics: Num rows: 4200 Data size: 781200 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 4200 Data size: 16800 Basic > stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col1 > input vertices: > 0 Union 11 > Statistics: Num rows: 79189328781 Data size: 0 Basic > stats: PARTIAL Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: int) > sort order: + > Map-reduce partition columns: _col1 (type: int) > Statistics: Num rows: 79189328781 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Execution mode: vectorized > Map 5 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and > d_date_sk is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_month_seq BETWEEN 1203 AND 1205 and > d_date_sk is not null) (type: boolean) > Statistics: Num rows: 36524 Data size: 292192 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 18262 Data size: 73048 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: store_sales > Partition key expr: ss_sold_date_sk > Statistics: Num rows: 18262 Data size: 73048 > Basic stats: COMPLETE Column stats: COMPLETE > Target column: ss_sold_date_sk > Target Vertex: Map 1 > Execution mode: vectorized > Map 6 > Map Operator Tree: > TableScan > alias: customer_address > filterExpr: ((ca_county is not null and ca_state is not > null) and ca_address_sk is not null) (type: boolean) > Statistics: Num rows: 40000000 Data size: 40595195284 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ca_county is not null and ca_state is not > null) and ca_address_sk is not null) (type: boolean) > Statistics: Num rows: 40000000 Data size: 7520000000 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ca_address_sk (type: int), ca_county > (type: string), ca_state (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 40000000 Data size: 7520000000 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col1 (type: string), _col2 (type: string) > 1 _col0 (type: string), _col1 (type: string) > outputColumnNames: _col0 > input vertices: > 1 Map 7 > Statistics: Num rows: 778829 Data size: 3115316 Basic > stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > outputColumnNames: _col5 > input vertices: > 1 Reducer 9 > Statistics: Num rows: 47909545988 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Reduce Output Operator > key expressions: _col5 (type: int) > sort order: + > Map-reduce partition columns: _col5 (type: int) > Statistics: Num rows: 47909545988 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Execution mode: vectorized > Map 7 > Map Operator Tree: > TableScan > alias: store > filterExpr: (s_county is not null and s_state is not null) > (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (s_county is not null and s_state is not null) > (type: boolean) > Statistics: Num rows: 1704 Data size: 313536 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: s_county (type: string), s_state (type: > string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1704 Data size: 313536 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string), _col1 (type: > string) > sort order: ++ > Map-reduce partition columns: _col0 (type: string), > _col1 (type: string) > Statistics: Num rows: 1704 Data size: 313536 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 8 > Map Operator Tree: > TableScan > alias: customer > filterExpr: (c_customer_sk is not null and > c_current_addr_sk is not null) (type: boolean) > Statistics: Num rows: 80000000 Data size: 68801615852 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (c_customer_sk is not null and > c_current_addr_sk is not null) (type: boolean) > Statistics: Num rows: 80000000 Data size: 640000000 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: c_customer_sk (type: int), > c_current_addr_sk (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 80000000 Data size: 640000000 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > outputColumnNames: _col0, _col1 > input vertices: > 1 Map 14 > Statistics: Num rows: 87108263547 Data size: 0 Basic > stats: PARTIAL Column stats: NONE > Group By Operator > keys: _col0 (type: int), _col1 (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 87108263547 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: > int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int) > Statistics: Num rows: 87108263547 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0) > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 49544675730 Data size: 396357405842 > Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: UDFToInteger((_col1 / 50.0)) (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 49544675730 Data size: 396357405842 > Basic stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 49544675730 Data size: 396357405842 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 49544675730 Data size: > 396357405842 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > Reducer 3 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 24772337865 Data size: 198178702921 > Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: bigint), > (_col0 * 50) (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 24772337865 Data size: 198178702921 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: bigint) > sort order: ++ > Statistics: Num rows: 24772337865 Data size: 198178702921 > Basic stats: COMPLETE Column stats: NONE > TopN Hash Memory Usage: 0.04 > value expressions: _col2 (type: int) > Reducer 4 > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: int), > KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 24772337865 Data size: 198178702921 > Basic stats: COMPLETE Column stats: NONE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 800 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 800 Basic stats: > COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Reducer 9 > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: int), KEY._col1 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 43554131773 Data size: 0 Basic stats: > PARTIAL Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: int) > sort order: + > Map-reduce partition columns: _col1 (type: int) > Statistics: Num rows: 43554131773 Data size: 0 Basic stats: > PARTIAL Column stats: NONE > value expressions: _col0 (type: int) > Union 11 > Vertex: Union 11 > Stage: Stage-0 > Fetch Operator > limit: 100 > Processor Tree: > ListSink > {code} > In Map 14 Data size is 0 > {code} > p 14 > Map Operator Tree: > TableScan > alias: item > filterExpr: (((i_category = 'Jewelry') and (i_class = > 'football')) and i_item_sk is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((i_category = 'Jewelry') and (i_class = > 'football')) and i_item_sk is not null) (type: boolean) > Statistics: Num rows: 4200 Data size: 781200 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 4200 Data size: 16800 Basic > stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col1 > input vertices: > 0 Union 11 > Statistics: Num rows: 79189328781 Data size: 0 Basic > stats: PARTIAL Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: int) > sort order: + > Map-reduce partition columns: _col1 (type: int) > Statistics: Num rows: 79189328781 Data size: 0 > Basic stats: PARTIAL Column stats: NONE > Execution mode: vectorized > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)