[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529498#comment-14529498 ]
Pengcheng Xiong commented on HIVE-8769: --------------------------------------- [~mohammedmostafa], I ran it with 1GB TPCDS and it seems that the PKFK is detected, although my plan is different from yours. {code} Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Map 6 <- Map 7 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:100 Stage-1 Reducer 4 File Output Operator [FS_33] compressed:false Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} Limit [LIM_32] Number of rows:100 Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_31] | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Reducer 3 [SIMPLE_EDGE] Reduce Output Operator [RS_30] key expressions:_col0 (type: string) sort order:+ Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE value expressions:_col1 (type: string), _col2 (type: decimal(7,2)) Group By Operator [GBY_28] | keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: decimal(7,2)) | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Reducer 2 [SIMPLE_EDGE] Reduce Output Operator [RS_27] key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) Map-reduce partition columns:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) sort order:+++ Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator [GBY_26] keys:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_24] outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE Merge Join Operator [MERGEJOIN_49] | condition map:[{"":"Inner Join 0 to 1"},{"":"Inner Join 1 to 2"}] | keys:{"2":"_col1 (type: int)","1":"_col0 (type: int)","0":"_col0 (type: int)"} | outputColumnNames:["_col2","_col3","_col4"] | Statistics:Num rows: 142657470 Data size: 56492358120 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 1 [SIMPLE_EDGE] | Reduce Output Operator [RS_18] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE | Select Operator [SEL_1] | outputColumnNames:["_col0"] | Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE | Filter Operator [FIL_44] | predicate:ss_item_sk is not null (type: boolean) | Statistics:Num rows: 2880404 Data size: 11521616 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_0] | alias:store_sales | Statistics:Num rows: 2880404 Data size: 385565005 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 5 [SIMPLE_EDGE] | Reduce Output Operator [RS_20] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 4500 Data size: 1798768 Basic stats: COMPLETE Column stats: COMPLETE | value expressions:_col1 (type: string), _col2 (type: string), _col3 (type: decimal(7,2)) | Select Operator [SEL_4] | outputColumnNames:["_col0","_col1","_col2","_col3"] | Statistics:Num rows: 4500 Data size: 1798768 Basic stats: COMPLETE Column stats: COMPLETE | Filter Operator [FIL_45] | predicate:((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) | Statistics:Num rows: 4500 Data size: 1816728 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_2] | alias:item | Statistics:Num rows: 18000 Data size: 5033899 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 6 [SIMPLE_EDGE] Reduce Output Operator [RS_22] key expressions:_col1 (type: int) Map-reduce partition columns:_col1 (type: int) sort order:+ Statistics:Num rows: 2936210 Data size: 11744840 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator [MAPJOIN_48] | condition map:[{"":"Inner Join 0 to 1"}] | keys:{"Map 7":"_col0 (type: int)","Map 6":"_col0 (type: int)"} | outputColumnNames:["_col1"] | Statistics:Num rows: 2936210 Data size: 11744840 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 7 [BROADCAST_EDGE] | Reduce Output Operator [RS_14] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE | Select Operator [SEL_10] | outputColumnNames:["_col0"] | Statistics:Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE | Filter Operator [FIL_47] | predicate:(d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) | Statistics:Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_8] | alias:date_dim | Statistics:Num rows: 73049 Data size: 10244389 Basic stats: COMPLETE Column stats: COMPLETE |<-Select Operator [SEL_7] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 5872500 Data size: 46980000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator [FIL_46] predicate:((inv_quantity_on_hand BETWEEN 100 AND 500 and inv_date_sk is not null) and inv_item_sk is not null) (type: boolean) Statistics:Num rows: 5872500 Data size: 69296172 Basic stats: COMPLETE Column stats: COMPLETE TableScan [TS_5] alias:inventory Statistics:Num rows: 11745000 Data size: 224675139 Basic stats: COMPLETE Column stats: COMPLETE Time taken: 0.495 seconds, Fetched: 117 row(s) {code} And, If I only join "item, inventory, date_dim" by using the following query {code} select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim where i_current_price between 30 and 30+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between '2002-05-30' and '2002-07-30' and i_manufact_id in (437,129,727,663) and inv_quantity_on_hand between 100 and 500 group by i_item_id,i_item_desc,i_current_price order by i_item_id limit 100; {code} the plan is {code} Plan optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Map 1 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:100 Stage-1 Reducer 3 File Output Operator [FS_28] compressed:false Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} Limit [LIM_27] Number of rows:100 Statistics:Num rows: 100 Data size: 39600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_26] | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE |<-Reducer 2 [SIMPLE_EDGE] Reduce Output Operator [RS_25] key expressions:_col0 (type: string) sort order:+ Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE value expressions:_col1 (type: string), _col2 (type: decimal(7,2)) Group By Operator [GBY_23] | keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: decimal(7,2)) | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_22] key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) Map-reduce partition columns:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) sort order:+++ Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator [GBY_21] keys:_col0 (type: string), _col1 (type: string), _col2 (type: decimal(7,2)) outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE Select Operator [SEL_19] outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator [MAPJOIN_38] | condition map:[{"":"Inner Join 0 to 1"}] | keys:{"Map 1":"_col1 (type: int)","Map 5":"_col0 (type: int)"} | outputColumnNames:["_col6","_col7","_col8"] | Statistics:Num rows: 734053 Data size: 290684988 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 5 [BROADCAST_EDGE] | Reduce Output Operator [RS_17] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 4500 Data size: 1798768 Basic stats: COMPLETE Column stats: COMPLETE | value expressions:_col1 (type: string), _col2 (type: string), _col3 (type: decimal(7,2)) | Select Operator [SEL_8] | outputColumnNames:["_col0","_col1","_col2","_col3"] | Statistics:Num rows: 4500 Data size: 1798768 Basic stats: COMPLETE Column stats: COMPLETE | Filter Operator [FIL_36] | predicate:((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) | Statistics:Num rows: 4500 Data size: 1816728 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_6] | alias:item | Statistics:Num rows: 18000 Data size: 5033899 Basic stats: COMPLETE Column stats: COMPLETE |<-Map Join Operator [MAPJOIN_37] | condition map:[{"":"Inner Join 0 to 1"}] | keys:{"Map 1":"_col0 (type: int)","Map 4":"_col0 (type: int)"} | outputColumnNames:["_col1"] | Statistics:Num rows: 2936210 Data size: 11744840 Basic stats: COMPLETE Column stats: COMPLETE |<-Map 4 [BROADCAST_EDGE] | Reduce Output Operator [RS_12] | key expressions:_col0 (type: int) | Map-reduce partition columns:_col0 (type: int) | sort order:+ | Statistics:Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE | Select Operator [SEL_5] | outputColumnNames:["_col0"] | Statistics:Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE | Filter Operator [FIL_35] | predicate:(d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) | Statistics:Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE | TableScan [TS_3] | alias:date_dim | Statistics:Num rows: 73049 Data size: 10244389 Basic stats: COMPLETE Column stats: COMPLETE |<-Select Operator [SEL_2] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 5872500 Data size: 46980000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator [FIL_34] predicate:((inv_quantity_on_hand BETWEEN 100 AND 500 and inv_date_sk is not null) and inv_item_sk is not null) (type: boolean) Statistics:Num rows: 5872500 Data size: 69296172 Basic stats: COMPLETE Column stats: COMPLETE TableScan [TS_0] alias:inventory Statistics:Num rows: 11745000 Data size: 224675139 Basic stats: COMPLETE Column stats: COMPLETE Time taken: 0.427 seconds, Fetched: 95 row(s) {code} As you can see, PK-FK is detected when inventory is joining date_dim (selectivity around 0.5) and then it is joining item, PK-FK is also detected (selectivity=0.25). Could you please double check or provide me another query that has the same issue? Thanks! > Physical optimizer : Incorrect CE results in a shuffle join instead of a Map > join (PK/FK pattern not detected) > -------------------------------------------------------------------------------------------------------------- > > Key: HIVE-8769 > URL: https://issues.apache.org/jira/browse/HIVE-8769 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Prasanth Jayachandran > Fix For: 1.2.0 > > > TPC-DS Q82 is running slower than hive 13 because the join type is not > correct. > The estimate for item x inventory x date_dim is 227 Million rows while the > actual is 3K rows. > Hive 13 finishes in 753 seconds. > Hive 14 finishes in 1,267 seconds. > Hive 14 + force map join finished in 431 seconds. > Query > {code} > select i_item_id > ,i_item_desc > ,i_current_price > from item, inventory, date_dim, store_sales > where i_current_price between 30 and 30+30 > and inv_item_sk = i_item_sk > and d_date_sk=inv_date_sk > and d_date between '2002-05-30' and '2002-07-30' > and i_manufact_id in (437,129,727,663) > and inv_quantity_on_hand between 100 and 500 > and ss_item_sk = i_item_sk > group by i_item_id,i_item_desc,i_current_price > order by i_item_id > limit 100 > {code} > Plan > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) > Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) > Reducer 5 <- Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Reducer 5 (SIMPLE_EDGE) > DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: item > filterExpr: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 115500 Data size: 34185680 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int), i_item_id (type: > string), i_item_desc (type: string), i_current_price (type: float) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 115500 Data size: 33724832 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 115500 Data size: 33724832 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > string), _col3 (type: float) > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 36524 Data size: 3579352 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 18262 Data size: 73048 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: inventory > Partition key expr: inv_date_sk > Statistics: Num rows: 18262 Data size: 73048 > Basic stats: COMPLETE Column stats: COMPLETE > Target column: inv_date_sk > Target Vertex: Map 7 > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: ss_item_sk is not null (type: boolean) > Statistics: Num rows: 82510879939 Data size: 6873789738208 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ss_item_sk is not null (type: boolean) > Statistics: Num rows: 82510879939 Data size: 330043519756 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 82510879939 Data size: > 330043519756 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 82510879939 Data size: > 330043519756 Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 7 > Map Operator Tree: > TableScan > alias: inventory > filterExpr: (inv_quantity_on_hand BETWEEN 100 AND 500 and > inv_item_sk is not null) (type: boolean) > Statistics: Num rows: 1,627,857,000 Data size: 19208695084 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (inv_quantity_on_hand BETWEEN 100 AND 500 and > inv_item_sk is not null) (type: boolean) > Statistics: Num rows: 813,928,500 Data size: 9604347540 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: inv_item_sk (type: int), inv_date_sk > (type: int) > outputColumnNames: _col0, _col2 > Statistics: Num rows: 813,928,500 Data size: 6511428000 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col2} > 1 {_col0} {_col1} {_col2} {_col3} > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col2, _col3, _col4, _col5, _col6 > input vertices: > 1 Map 1 > Statistics: Num rows: 203,482,128 Data size: > 59416781376 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col3} {_col4} {_col5} {_col6} > 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col3, _col4, _col5, _col6 > input vertices: > 1 Map 2 > Statistics: Num rows: 227,514,273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col3 (type: int), _col4 (type: > string), _col5 (type: string), _col6 (type: float) > outputColumnNames: _col3, _col4, _col5, _col6 > Statistics: Num rows: 227514273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col3 (type: int) > sort order: + > Map-reduce partition columns: _col3 (type: int) > Statistics: Num rows: 227514273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col4 (type: string), _col5 > (type: string), _col6 (type: float) > Execution mode: vectorized > Reducer 4 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 {VALUE._col3} {VALUE._col4} {VALUE._col5} > outputColumnNames: _col5, _col6, _col7 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col5 (type: string), _col6 (type: string), > _col7 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: string), _col1 (type: string), _col2 > (type: float) > mode: hash > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string), _col1 (type: > string), _col2 (type: float) > sort order: +++ > Map-reduce partition columns: _col0 (type: string), > _col1 (type: string), _col2 (type: float) > Statistics: Num rows: 5156859392 Data size: > 1485175504896 Basic stats: COMPLETE Column stats: COMPLETE > Reducer 5 > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string), KEY._col1 (type: string), > KEY._col2 (type: float) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > TopN Hash Memory Usage: 0.04 > value expressions: _col1 (type: string), _col2 (type: > float) > Execution mode: vectorized > Reducer 6 > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 > (type: string), VALUE._col1 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 28800 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 28800 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Execution mode: vectorized > Stage: Stage-0 > Fetch Operator > limit: 100 > Processor Tree: > ListSink > {code} > Actual rows counts > {code} > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS > CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS > Map 1 1 0 0 1.01 > 1,280 0 462,000 65 > Map 2 1 0 0 0.41 > 400 23 10,000 62 > Map 3 2574 0 0 947.79 > 442,220,640 1,887,714 82,510,879,939 82,510,879,939 > Map 7 9 0 0 869.22 > 42,490 1,215 56,133,127 3,081 > Reducer 4 1009 0 0 389.75 > 69,471,510 1,149,529 82,510,883,020 33 > Reducer 5 253 0 0 93.73 > 938,930 26,150 33 33 > Reducer 6 1 0 0 2.08 > 730 10 33 33 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)