[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576831#comment-13576831 ]
Ashutosh Chauhan commented on HIVE-3972: ---------------------------------------- [~navis] I agree HIVE-3562 is orthogonal issue which will make what I am suggesting lesser of an issue, but there are still some cases. As getting discussed on HIVE-3562 consider following query: {code} select value, sum(key) as sum from src group by value order by value limit 10; {code} In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization won't kick in. After patch as it is currently on this jira, we will generate 1MR job with multiple reducers and than do order-by on client in Fetch task. Here if you don't take advantage of the fact that there is a limit in query you might possibly read millions of rows from hdfs, bring all of them in client memory and than just show 10 to user. If you instead take limit into account and stop merging and reading as soon as you have seen 10 rows, you have saved both on hdfs IO as well as client memory. Make sense ? > Support using multiple reducer for fetching order by results > ------------------------------------------------------------ > > Key: HIVE-3972 > URL: https://issues.apache.org/jira/browse/HIVE-3972 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, > HIVE-3972.D8349.3.patch > > > Queries for fetching results which have lastly "order by" clause make final > MR run with single reducer, which can be too much. For example, > {code} > select value, sum(key) as sum from src group by value order by sum; > {code} > If number of reducer is reasonable, multiple result files could be merged > into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira