You can control when the reduce task even starts in the first place. The
parameter *mapred.reduce.slowstart.completed.maps *
specifies the fraction of the number of maps in the job which should be
complete before reduces are scheduled for the job. So for example if you
set this to .70 then reduce task will start after 70% of mappers are
completed.  Remember that the actual 'reduce' phase of the reducer will not
start until all the mappers are competed. Sometimes when this value is set
too low, reduce tasks starts even when there are a lot of mappers yet to
complete. This results in a lot of killed reduce task attempts as they are
waiting for the outputs from all the mappers to be available.

Thanks & Regards,
Rakesh


On Wed, Apr 23, 2014 at 7:43 AM, Chi Huynh <hu...@initions.com> wrote:

> The MapReduce-Job contains a shuffle phase, where the intermediary map
> outputs are copied to the reducer nodes. This phase of the job is assumed
> to be part of the reduce-phase, therefore. the counter already starts
> before the map-phase has finished. The actual reduce task will be started,
> just as you have heard, when all the map tasks are finished.
>
>
> On Wednesday, April 23, 2014 1:18:40 PM UTC+2, Kishore kumar wrote:
>>
>> Hi All,
>>
>> I heard about the reduce job, it will be started after all map tasks
>> finished 100%, but in my hive query the reduce job started at below stage,
>> please explain why is this.(I copied below line when the job is running).
>>
>> 2014-04-22 21:15:12,803 Stage-1 map = 83%, reduce = 1%, Cumulative CPU
>> 4194.4 sec
>>
>> --
>>
>>
>> *Kishore *
>>
>

Reply via email to