[ https://issues.apache.org/jira/browse/HIVE-20108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar reassigned HIVE-20108: ----------------------------------- > Investigate alternatives to groupByKey > -------------------------------------- > > Key: HIVE-20108 > URL: https://issues.apache.org/jira/browse/HIVE-20108 > Project: Hive > Issue Type: Improvement > Components: Spark > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > > We use {{groupByKey}} for aggregations (or if > {{hive.spark.use.groupby.shuffle}} is false we use > {{repartitionAndSortWithinPartitions}}). > {{groupByKey}} has its drawbacks because it can't spill records within a > single key group. It also seems to be doing some unnecessary work in Spark's > {{Aggregator}} (not positive about this part). > {{repartitionAndSortWithinPartitions}} is better, but the sorting within > partitions isn't necessary for aggregations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)