Sahil Takiar created HIVE-20270: ----------------------------------- Summary: Don't serialize hashCode for groupByKey Key: HIVE-20270 URL: https://issues.apache.org/jira/browse/HIVE-20270 Project: Hive Issue Type: Bug Components: Spark Reporter: Sahil Takiar Assignee: Sahil Takiar
Similar to HIVE-20032, but for {{groupByKey}}. The tricky part with {{groupByKey}} is we need to preserve the {{hashCode}} until the key gets partitioned (via the {{HashPartitioner}}) but after that we don't really need to preserve the {{hashCode}}. The {{groupByKey}} operator in Spark does require a {{hashCode}} since it puts everything in a map, but it can use a different hash-code than the one specified in {{HiveKey}}. The hashcode in {{HiveKey}} is only important for determining the partition the key should be assigned to. The drawback is that computing the hashcode for each {{HiveKey}} might require more CPU resources, but we should profile it just in case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)