[ https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603481#comment-14603481 ]
Mohit Sabharwal commented on HIVE-11032: ---------------------------------------- Thanks [~lirui], yes verified that query plan is in line with what we see in MR. When {{hive.groupby.skewindata=true}} is set, unless there is a distinct clause, the Reduce Output Operator partitions based on {{rand()}}. (The subsequent Reducer then does partial aggregation and the following reducer does final aggregation.) I also verified the behavior for other cases as well, for example when {{hive.map.aggr=true}} is set in addition to {{hive.groupby.skewindata=true}} as documented here: https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup The {{index_bitmap3}} test failure is unrelated to this patch. > Enable more tests for grouping by skewed data [Spark Branch] > ------------------------------------------------------------ > > Key: HIVE-11032 > URL: https://issues.apache.org/jira/browse/HIVE-11032 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Rui Li > Assignee: Mohit Sabharwal > Priority: Minor > Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch > > > Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use > this JIRA to track whether we need more of them. > Basically, we need to look at all tests with {{set > hive.groupby.skewindata=true;}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)