[ 
https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603481#comment-14603481
 ] 

Mohit Sabharwal commented on HIVE-11032:
----------------------------------------

Thanks [~lirui], yes verified that query plan is in line with what we see in MR.

When {{hive.groupby.skewindata=true}} is set, unless there is a distinct 
clause, the Reduce Output Operator partitions based on {{rand()}}. (The 
subsequent Reducer then does partial aggregation and the following reducer does 
final aggregation.)

I also verified the behavior for other cases as well, for example when 
{{hive.map.aggr=true}} is set in addition to {{hive.groupby.skewindata=true}} 
as documented here: 
https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup

The {{index_bitmap3}} test failure is unrelated to this patch. 

> Enable more tests for grouping by skewed data [Spark Branch]
> ------------------------------------------------------------
>
>                 Key: HIVE-11032
>                 URL: https://issues.apache.org/jira/browse/HIVE-11032
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Mohit Sabharwal
>            Priority: Minor
>         Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch
>
>
> Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use 
> this JIRA to track whether we need more of them.
> Basically, we need to look at all tests with {{set 
> hive.groupby.skewindata=true;}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to