[ https://issues.apache.org/jira/browse/HIVE-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266681#comment-14266681 ]
Chao commented on HIVE-9112: ---------------------------- Hi [~tedxu], looks like this is related to Constant Propagation. The (partial) plan with this optimization: {noformat} ... 78 Stage: Stage-3 79 Map Reduce 80 Map Operator Tree: 81 TableScan 82 Reduce Output Operator 83 key expressions: _col1 (type: int), 1 (type: int) 84 sort order: ++ 85 Map-reduce partition columns: _col1 (type: int) 86 Statistics: Num rows: 27 Data size: 3298 Basic stats: COMPLETE Column stats: NONE 87 value expressions: _col0 (type: int), _col3 (type: int) 88 TableScan 89 alias: lineitem 90 Statistics: Num rows: 100 Data size: 11999 Basic stats: COMPLETE Column stats: NONE 91 Filter Operator 92 predicate: ((((l_shipmode = 'AIR') and l_orderkey is not null) and l_linenumber is not null) and (l_linenumber = 1)) (type: boolean) 93 Statistics: Num rows: 6 Data size: 719 Basic stats: COMPLETE Column stats: NONE 94 Select Operator 95 expressions: l_orderkey (type: int), 1 (type: int) 96 outputColumnNames: _col0, _col1 97 Statistics: Num rows: 6 Data size: 719 Basic stats: COMPLETE Column stats: NONE 98 Group By Operator 99 keys: _col0 (type: int), _col1 (type: int) 100 mode: hash 101 outputColumnNames: _col0, _col1 102 Statistics: Num rows: 6 Data size: 719 Basic stats: COMPLETE Column stats: NONE 103 Reduce Output Operator 104 key expressions: _col0 (type: int), _col1 (type: int) 105 sort order: ++ 106 Map-reduce partition columns: _col0 (type: int), _col1 (type: int) 107 Statistics: Num rows: 6 Data size: 719 Basic stats: COMPLETE Column stats: NONE 108 Reduce Operator Tree: 109 Join Operator 110 condition map: 111 Left Semi Join 0 to 1 112 keys: 113 0 _col1 (type: int), _col4 (type: int) 114 1 _col0 (type: int), _col1 (type: int) 115 outputColumnNames: _col0, _col3 116 Statistics: Num rows: 29 Data size: 3627 Basic stats: COMPLETE Column stats: NONE 117 Select Operator 118 expressions: _col0 (type: int), _col3 (type: int) 119 outputColumnNames: _col0, _col1 120 Statistics: Num rows: 29 Data size: 3627 Basic stats: COMPLETE Column stats: NONE 121 File Output Operator 122 compressed: false 123 Statistics: Num rows: 29 Data size: 3627 Basic stats: COMPLETE Column stats: NONE 124 table: 125 input format: org.apache.hadoop.mapred.TextInputFormat 126 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 127 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe ... {noformat} And diff for this part (on the left is the plan w/o the optimization): {noformat} 83c83 < key expressions: _col1 (type: int), _col4 (type: int) --- > key expressions: _col1 (type: int), 1 (type: int) 85c85 < Map-reduce partition columns: _col1 (type: int), _col4 (type: int) --- > Map-reduce partition columns: _col1 (type: int) 95c95 < expressions: l_orderkey (type: int), l_linenumber (type: int) --- > expressions: l_orderkey (type: int), 1 (type: int) {noformat} Notice that on line 85, the MR partition column {{_col4}} has been optimized away, which causes an inconsistency. Later on, output rows for join will be hashed to different reducers, and therefore introduces wrong results. I saw that [~navis] has a [comment|https://issues.apache.org/jira/browse/HIVE-7232?focusedCommentId=14032106&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032106] about some similar issue, maybe it's related? I'm not an expert in Constant Propagation, and I'm thinking whether you can take a look at this issue? Thanks. > Query may generate different results depending on the number of reducers > ------------------------------------------------------------------------ > > Key: HIVE-9112 > URL: https://issues.apache.org/jira/browse/HIVE-9112 > Project: Hive > Issue Type: Bug > Reporter: Chao > Assignee: Chao > > Some queries may generate different results depending on the number of > reducers, for example, tests like ppd_multi_insert.q, join_nullsafe.q, > subquery_in.q, etc. > Take subquery_in.q as example, if we add > {noformat} > set mapred.reduce.tasks=3; > {noformat} > to this test file, the result will be different (and wrong): > {noformat} > @@ -903,5 +903,3 @@ where li.l_linenumber = 1 and > POSTHOOK: type: QUERY > POSTHOOK: Input: default@lineitem > #### A masked pattern was here #### > -108570 8571 > -4297 1798 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)