[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750878#comment-13750878
 ] 

Yin Huai commented on HIVE-5149:
--------------------------------

Suppose that we have a parent RS and a child RS. If the child RS can be 
removed, ReduceSinkDeDuplication always assigns the more specific partitioning 
columns to the parent RS. For example, if we have "GROUP BY a, b DISTRIBUTE BY 
a", in the single MR job, the RS uses "a" and "b" as partitioning columns. 
Seems we need to change ReduceSinkDeDuplication to use the more general 
partitioning columns. I mean we need to use "a" as the partition column. This 
change can limit the parallelism of the reduce phase. 
                
> ReduceSinkDeDuplication can pick the wrong partitioning columns
> ---------------------------------------------------------------
>
>                 Key: HIVE-5149
>                 URL: https://issues.apache.org/jira/browse/HIVE-5149
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to