[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576831#comment-13576831
 ] 

Ashutosh Chauhan commented on HIVE-3972:
----------------------------------------

[~navis] I agree HIVE-3562 is orthogonal issue which will make what I am 
suggesting lesser of an issue, but there are still some cases. As getting 
discussed on HIVE-3562 consider following query: 
{code}
select value, sum(key) as sum from src group by value order by value limit 10;
{code}
In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization 
won't kick in. After patch as it is currently on this jira, we will generate 
1MR job with multiple reducers and than do order-by on client in Fetch task. 
Here if you don't take advantage of the fact that there is a limit in query you 
might possibly read millions of rows from hdfs, bring all of them in client 
memory and than just show 10 to user. If you instead take limit into account 
and stop merging and reading as soon as you have seen 10 rows, you have saved 
both on hdfs IO as well as client memory. Make sense ? 
                
> Support using multiple reducer for fetching order by results
> ------------------------------------------------------------
>
>                 Key: HIVE-3972
>                 URL: https://issues.apache.org/jira/browse/HIVE-3972
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
> HIVE-3972.D8349.3.patch
>
>
> Queries for fetching results which have lastly "order by" clause make final 
> MR run with single reducer, which can be too much. For example, 
> {code}
> select value, sum(key) as sum from src group by value order by sum;
> {code}
> If number of reducer is reasonable, multiple result files could be merged 
> into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to