[ https://issues.apache.org/jira/browse/HIVE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740740#comment-13740740 ]
Thejas M Nair commented on HIVE-5093: ------------------------------------- In my experience with pig and combiners, combiners can be significantly expensive because the records are serialized and deserialized between the map and combiner boundary. Ie, if the use of combiner does not result in data size reduction beyond some threshold, it will actually slow the query. Can you also try "limit 480200;" just to get an idea about what the worst case performance for this can be ? > Use a combiner for LIMIT with GROUP BY and ORDER BY operators > ------------------------------------------------------------- > > Key: HIVE-5093 > URL: https://issues.apache.org/jira/browse/HIVE-5093 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.12.0 > Reporter: Gopal V > Assignee: Gopal V > Attachments: HIVE-5093-WIP-01.patch > > > Operator trees of the following structure can have a memory friendly combiner > put in place after the sort-phase > "GBY-LIM" and "OBY-LIM" > This will cut down on I/O when spilling to disk and particularly during the > merge phase of the reducer. > There are two possible combiners - LimitNKeysCombiner and > LimitNValuesCombiner. > The first one would be ideal for the GROUP-BY case, while the latter would > more useful for the ORDER-BY case. > The combiners are still relevant even if there are 1:1 forward operators on > the reducer side and for small data items, the MR base layer does not run the > combiners at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira