Task scheduler

2010-05-13 Thread Saurabh Agarwal
Hi,

i am experimenting with hadoop. wanted to ask that is the Task distribution
policy by job tracker pluggable if yes where in the code tree is it defined.


Thanks and regards
Saurabh Agarwal


Re: Task scheduler

2010-05-13 Thread Hemanth Yamijala
Saurabh,

> i am experimenting with hadoop. wanted to ask that is the Task distribution
> policy by job tracker pluggable if yes where in the code tree is it defined.
>

Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
that needs to be extended to define a new scheduling policy. Also,
please do take a look at the existing schedulers that extend this
class. There are 3-4 implementations including the default scheduler,
capacity scheduler, fairshare scheduler and dynamic priority
scheduler. It may be worthwhile to see if your ideas match any of the
existing implementations to some degree and then consider enhancing
those as a first option.

Thanks
Hemanth


Re: Task scheduler

2010-05-13 Thread Saurabh Agarwal
Hi Hemanth,

 let me re frame my question I wanted to knowhow job tracker decides the
assignment of input splits to task tracker based on task tracker's data
locality. Where is this policy defined? Is it pluggable?
Saurabh Agarwal


On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala wrote:

> Saurabh,
>
> > i am experimenting with hadoop. wanted to ask that is the Task
> distribution
> > policy by job tracker pluggable if yes where in the code tree is it
> defined.
> >
>
> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
> that needs to be extended to define a new scheduling policy. Also,
> please do take a look at the existing schedulers that extend this
> class. There are 3-4 implementations including the default scheduler,
> capacity scheduler, fairshare scheduler and dynamic priority
> scheduler. It may be worthwhile to see if your ideas match any of the
> existing implementations to some degree and then consider enhancing
> those as a first option.
>
> Thanks
> Hemanth
>


Re: Task scheduler

2010-05-13 Thread Hemanth Yamijala
Saurabh,

>  let me re frame my question I wanted to knowhow job tracker decides the
> assignment of input splits to task tracker based on task tracker's data
> locality. Where is this policy defined? Is it pluggable?

Sorry, I misunderstood your question then. This code is in
o.a.h.mapred.JobInProgress. It is likely spread across many methods in
the class. But a good starting point could be from methods like
obtainNewMapTask or obtainNewReduceTask.

At the moment, this policy is not pluggable. But I know there have
been discussions (possibly even a JIRA, though I can't locate any now)
asking for this capability.

Thanks
Hemanth

>
> On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala wrote:
>
>> Saurabh,
>>
>> > i am experimenting with hadoop. wanted to ask that is the Task
>> distribution
>> > policy by job tracker pluggable if yes where in the code tree is it
>> defined.
>> >
>>
>> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
>> that needs to be extended to define a new scheduling policy. Also,
>> please do take a look at the existing schedulers that extend this
>> class. There are 3-4 implementations including the default scheduler,
>> capacity scheduler, fairshare scheduler and dynamic priority
>> scheduler. It may be worthwhile to see if your ideas match any of the
>> existing implementations to some degree and then consider enhancing
>> those as a first option.
>>
>> Thanks
>> Hemanth
>>
>