Hi Hemanth, let me re frame my question I wanted to knowhow job tracker decides the assignment of input splits to task tracker based on task tracker's data locality. Where is this policy defined? Is it pluggable? Saurabh Agarwal
On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yhema...@gmail.com>wrote: > Saurabh, > > > i am experimenting with hadoop. wanted to ask that is the Task > distribution > > policy by job tracker pluggable if yes where in the code tree is it > defined. > > > > Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class > that needs to be extended to define a new scheduling policy. Also, > please do take a look at the existing schedulers that extend this > class. There are 3-4 implementations including the default scheduler, > capacity scheduler, fairshare scheduler and dynamic priority > scheduler. It may be worthwhile to see if your ideas match any of the > existing implementations to some degree and then consider enhancing > those as a first option. > > Thanks > Hemanth >