wangmeng created HIVE-7277:
------------------------------

             Summary: how to decide reduce numbers   according  to  the input 
size of reduce stage rather than the  input size of  map stage?
                 Key: HIVE-7277
                 URL: https://issues.apache.org/jira/browse/HIVE-7277
             Project: Hive
          Issue Type: New Feature
            Reporter: wangmeng
             Fix For: 0.13.0


As we  know ,now  hive decide the  reduce numbers  just by  the " Input size of 
  map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....

But ,I  think  the out put size of map stage  may have a big difference from  
the original  input size , so I  think  this  strategy to decide reduce-numbers 
may be improper....

So is   there any feature  which can decide the reduce number just  according 
to the out put  of the map stage.?    thanks .  

 As  I know , actually ,the reduce stage will begin just  after some map tasks 
have finished rather than until  the  whole map stage have finished , so I  
think  it is improper too  decide reduce numbers   when  the  whole map stage  
have finished.

As  someone point ,We can just according to  the out put size of the  earliest 
map tasks which have finished   to  estimate the whole reduce 
numbers......However,   in fact ,now Hive has used filter push down(where) 
,which may  resulting a big  difference from each map task .

So,  this  estimation  is improper.

thanks .




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to