Saurabh, > let me re frame my question I wanted to knowhow job tracker decides the > assignment of input splits to task tracker based on task tracker's data > locality. Where is this policy defined? Is it pluggable?
Sorry, I misunderstood your question then. This code is in o.a.h.mapred.JobInProgress. It is likely spread across many methods in the class. But a good starting point could be from methods like obtainNewMapTask or obtainNewReduceTask. At the moment, this policy is not pluggable. But I know there have been discussions (possibly even a JIRA, though I can't locate any now) asking for this capability. Thanks Hemanth > > On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yhema...@gmail.com>wrote: > >> Saurabh, >> >> > i am experimenting with hadoop. wanted to ask that is the Task >> distribution >> > policy by job tracker pluggable if yes where in the code tree is it >> defined. >> > >> >> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class >> that needs to be extended to define a new scheduling policy. Also, >> please do take a look at the existing schedulers that extend this >> class. There are 3-4 implementations including the default scheduler, >> capacity scheduler, fairshare scheduler and dynamic priority >> scheduler. It may be worthwhile to see if your ideas match any of the >> existing implementations to some degree and then consider enhancing >> those as a first option. >> >> Thanks >> Hemanth >> >