Task scheduler
Hi, i am experimenting with hadoop. wanted to ask that is the Task distribution policy by job tracker pluggable if yes where in the code tree is it defined. Thanks and regards Saurabh Agarwal
Re: Task scheduler
Saurabh, > i am experimenting with hadoop. wanted to ask that is the Task distribution > policy by job tracker pluggable if yes where in the code tree is it defined. > Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class that needs to be extended to define a new scheduling policy. Also, please do take a look at the existing schedulers that extend this class. There are 3-4 implementations including the default scheduler, capacity scheduler, fairshare scheduler and dynamic priority scheduler. It may be worthwhile to see if your ideas match any of the existing implementations to some degree and then consider enhancing those as a first option. Thanks Hemanth
Re: Task scheduler
Hi Hemanth, let me re frame my question I wanted to knowhow job tracker decides the assignment of input splits to task tracker based on task tracker's data locality. Where is this policy defined? Is it pluggable? Saurabh Agarwal On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala wrote: > Saurabh, > > > i am experimenting with hadoop. wanted to ask that is the Task > distribution > > policy by job tracker pluggable if yes where in the code tree is it > defined. > > > > Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class > that needs to be extended to define a new scheduling policy. Also, > please do take a look at the existing schedulers that extend this > class. There are 3-4 implementations including the default scheduler, > capacity scheduler, fairshare scheduler and dynamic priority > scheduler. It may be worthwhile to see if your ideas match any of the > existing implementations to some degree and then consider enhancing > those as a first option. > > Thanks > Hemanth >
Re: Task scheduler
Saurabh, > let me re frame my question I wanted to knowhow job tracker decides the > assignment of input splits to task tracker based on task tracker's data > locality. Where is this policy defined? Is it pluggable? Sorry, I misunderstood your question then. This code is in o.a.h.mapred.JobInProgress. It is likely spread across many methods in the class. But a good starting point could be from methods like obtainNewMapTask or obtainNewReduceTask. At the moment, this policy is not pluggable. But I know there have been discussions (possibly even a JIRA, though I can't locate any now) asking for this capability. Thanks Hemanth > > On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala wrote: > >> Saurabh, >> >> > i am experimenting with hadoop. wanted to ask that is the Task >> distribution >> > policy by job tracker pluggable if yes where in the code tree is it >> defined. >> > >> >> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class >> that needs to be extended to define a new scheduling policy. Also, >> please do take a look at the existing schedulers that extend this >> class. There are 3-4 implementations including the default scheduler, >> capacity scheduler, fairshare scheduler and dynamic priority >> scheduler. It may be worthwhile to see if your ideas match any of the >> existing implementations to some degree and then consider enhancing >> those as a first option. >> >> Thanks >> Hemanth >> >