Hi, On Mon, Nov 1, 2010 at 9:13 AM, He Chen <airb...@gmail.com> wrote: > If you use the default scheduler of hadoop 0.20.2 or higher. The > jobQueueScheduler will take the data locality into account.
This is true irrespective of the scheduler in use. Other schedulers currently add a layer to decide which job to pick up first based on constraints they choose to satisfy - like fairness, queue capacities etc. Once a job is picked up, the logic for picking up a task within the job is currently in framework code that all schedulers use. > That means when > a heart beat from TT arrives, the JT will first check a cache which is a map > of node and data-local tasks this node has. The JT will assign node local > task first, then the rack local, non-local, recover and speculative tasks if > they have default priorities. > > If a TT get a non-local task, it will query the nodes which have the data > and finish this task, you can also decide to keep those fetched data on this > TT or not by configuring the Hadoop mapred-site.xml file. > > BTW, even TT get a data local task, it may also ask other data owner (if you > have more than one replica)for data to accelerate the process. (??? my > understanding, any one can confirm) Not that I am aware of. The task's input location is used directly to read the data. Thanks Hemanth > > Hope this will help. > > Chen > > On Sun, Oct 31, 2010 at 9:49 PM, Zhenhua Guo <jen...@gmail.com> wrote: > >> Thanks! >> One more question. Is the input file replicated on each node where a >> mapper is run? Or just the portion processed by a mapper is >> transferred? >> >> Gerald >> >> On Fri, Oct 29, 2010 at 10:11 AM, Harsh J <qwertyman...@gmail.com> wrote: >> > Hello, >> > >> > On Fri, Oct 29, 2010 at 12:45 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >> TaskTracker will tell JobTracker how many free slots it has through >> >> heartbeat. And JobTracker will choose the best tasktracker with the >> >> consideration of data locality. >> > >> > Yes. To add some more, a scheduler is responsible to do assignments of >> > tasks (based on various stats, including data locality) to proper >> > tasktrackers. Scheduler.assignTasks(TaskTracker) is used to assign a >> > TaskTracker its tasks, and the scheduler type is configurable (Some >> > examples are Eager/FIFO scheduler, Capacity scheduler, etc.). >> > >> > This scheduling is done when a heart beat response is to be sent back >> > to a TaskTracker that called JobTracker.heartbeat(...). >> > >> >> >> >> >> >> On Thu, Oct 28, 2010 at 2:52 PM, Zhenhua Guo <jen...@gmail.com> wrote: >> >>> Hi, all >> >>> I wonder how Hadoop schedules mappers and reducers (e.g. consider >> >>> load balancing, affinity to data?). For example, how to decide on >> >>> which nodes mappers and reducers are to be executed and when. >> >>> Thanks! >> >>> >> >>> Gerald >> >>> >> >> >> >> >> >> >> >> -- >> >> Best Regards >> >> >> >> Jeff Zhang >> >> >> > >> > >> > >> > -- >> > Harsh J >> > www.harshj.com >> > >> >