[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751502#comment-13751502 ]
Yin Huai commented on HIVE-5149: -------------------------------- Another example, in reduce_deduplicate_extended.q, there is {code:sql} explain from (select key, value from src group by key, value) s select s.key group by s.key; {\code} The plan is {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: s:src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: key, value Group By Operator bucketGroup: false keys: expr: key type: string expr: value type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: string sort order: ++ Map-reduce partition columns: expr: _col0 type: string expr: _col1 type: string tag: -1 Reduce Operator Tree: Group By Operator bucketGroup: false keys: expr: KEY._col0 type: string expr: KEY._col1 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator bucketGroup: false keys: expr: _col0 type: string mode: complete outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} I think the plan is wrong. We should use key as the partitioning column to make sure all rows associated with the same key will be assigned to the same reducer. > ReduceSinkDeDuplication can pick the wrong partitioning columns > --------------------------------------------------------------- > > Key: HIVE-5149 > URL: https://issues.apache.org/jira/browse/HIVE-5149 > Project: Hive > Issue Type: Bug > Reporter: Yin Huai > Assignee: Yin Huai > > https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira