[jira] [Commented] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

wangmeng (JIRA) Mon, 23 Jun 2014 23:03:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041741#comment-14041741
 ]


wangmeng commented on HIVE-7277:
--------------------------------

As  I  know ,TEZ is a new  compute engine  different from mapreduce,   is there 
 any  solution based on map reduce engine  ?

> how to decide reduce numbers   according  to  the input size of reduce stage 
> rather than the  input size of  map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7277
>                 URL: https://issues.apache.org/jira/browse/HIVE-7277
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: wangmeng
>             Fix For: 0.13.0
>
>
> As we  know ,now  hive decide the  reduce numbers  just by  the " Input size 
> of   map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I  think  the out put size of map stage  may have a big difference from  
> the original  input size , so I  think  this  strategy to decide 
> reduce-numbers may be improper....
> So is   there any feature  which can decide the reduce number just  according 
> to the out put  of the map stage.?    thanks .  
>  As  I know , actually ,the reduce stage will begin just  after some map 
> tasks have finished rather than until  the  whole map stage have finished , 
> so I  think  it is improper too  decide reduce numbers   when  the  whole map 
> stage  have finished.
> As  someone point ,We can just according to  the out put size of the  
> earliest map tasks which have finished   to  estimate the whole reduce 
> numbers......However,   in fact ,now Hive has used filter push down(where) 
> ,which may  resulting a big  difference from each map task .
> So，  this  estimation  is improper.
> thanks .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?

Reply via email to