[ 
https://issues.apache.org/jira/browse/HIVE-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-15683:
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0
     Release Note: Document the new configuration for 2.2.0.
           Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review, Chao!

> Make what's done in HIVE-15580 for group by configurable
> --------------------------------------------------------
>
>                 Key: HIVE-15683
>                 URL: https://issues.apache.org/jira/browse/HIVE-15683
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.2.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>              Labels: TODOC2.2
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15683.1.patch, HIVE-15683.2.patch, HIVE-15683.patch
>
>
> HIVE-15580 changed the way the data is shuffled for group by: instead of 
> using Spark's groupByKey to shuffle data, Hive on Spark now uses 
> repartitionAndSortWithinPartitions(), which generates (key, value) pairs 
> instead of original (key, value iterator). This might have some performance 
> implications, but it's needed to get rid of unbound memory usage by 
> {{groupByKey}}.
> Here we'd like to compare group by performance with or w/o HIVE-15580. If the 
> impact is significant, we can provide a configuration that allows user to 
> switch back to the original way of shuffling.
> This work should be ideally done after HIVE-15682 as the optimization there 
> should help the performance here as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to