[ https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041741#comment-14041741 ]
wangmeng commented on HIVE-7277: -------------------------------- As I know ,TEZ is a new compute engine different from mapreduce, is there any solution based on map reduce engine ? > how to decide reduce numbers according to the input size of reduce stage > rather than the input size of map stage? > ----------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-7277 > URL: https://issues.apache.org/jira/browse/HIVE-7277 > Project: Hive > Issue Type: New Feature > Reporter: wangmeng > Fix For: 0.13.0 > > > As we know ,now hive decide the reduce numbers just by the " Input size > of map/ hive.exec.reducers.bytes.per.reducer(default 1G )..... > But ,I think the out put size of map stage may have a big difference from > the original input size , so I think this strategy to decide > reduce-numbers may be improper.... > So is there any feature which can decide the reduce number just according > to the out put of the map stage.? thanks . > As I know , actually ,the reduce stage will begin just after some map > tasks have finished rather than until the whole map stage have finished , > so I think it is improper too decide reduce numbers when the whole map > stage have finished. > As someone point ,We can just according to the out put size of the > earliest map tasks which have finished to estimate the whole reduce > numbers......However, in fact ,now Hive has used filter push down(where) > ,which may resulting a big difference from each map task . > So, this estimation is improper. > thanks . -- This message was sent by Atlassian JIRA (v6.2#6252)