[ https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542409#comment-16542409 ]
Sahil Takiar commented on HIVE-20032: ------------------------------------- [~lirui] thanks for taking a look. So I took a closer look at this, and I think there might be a way to specify custom serializers just for shuffles. However, it require accessing some lower-level Spark APIs. The idea is that RDD operations such as {{SortByKey}} and {{repartitionAndSortWithinPartitions}} return a {{ShuffledRDD}}. The {{ShuffledRDD}} object has a method called {{setSerializer}} that allows users to set a custom serializer for that RDD. Certain RDD APIs such as {{combineByKey}} expose setting a custom serializer via invoking the {{ShuffledRDD#setSerializer}} method, however, it doesn't look like {{sortByKey}} or {{repartitionAndSortWithinPartitions}} does. I think this is probably better than my original approach. The other issue is that specifying a customer serializer doesn't work with the way we currently shade Kryo in {{hive-exec}} (I think you found similar issues while working on HIVE-15104). So I had to remove the relocation for Kryo (which was added in HIVE-5915). Hopefully thats ok since Spark and Hive use the same version of Kryo. I attached an updated patch (still a WIP) that implements this approach. > Don't serialize hashCode when groupByShuffle and RDD cacheing is disabled > ------------------------------------------------------------------------- > > Key: HIVE-20032 > URL: https://issues.apache.org/jira/browse/HIVE-20032 > Project: Hive > Issue Type: Improvement > Components: Spark > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, > HIVE-20032.3.patch, HIVE-20032.4.patch > > > Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, > then we don't need to serialize the hashCode when shuffling data in HoS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)