Thanks, Jeff, Harsh, He, Hemanth. Those information is quite helpful! Gerald
On Mon, Nov 1, 2010 at 12:01 AM, Hemanth Yamijala <yhema...@gmail.com> wrote: > Hi, > > On Mon, Nov 1, 2010 at 9:13 AM, He Chen <airb...@gmail.com> wrote: >> If you use the default scheduler of hadoop 0.20.2 or higher. The >> jobQueueScheduler will take the data locality into account. > > This is true irrespective of the scheduler in use. Other schedulers > currently add a layer to decide which job to pick up first based on > constraints they choose to satisfy - like fairness, queue capacities > etc. Once a job is picked up, the logic for picking up a task within > the job is currently in framework code that all schedulers use. > >> That means when >> a heart beat from TT arrives, the JT will first check a cache which is a map >> of node and data-local tasks this node has. The JT will assign node local >> task first, then the rack local, non-local, recover and speculative tasks if >> they have default priorities. >> >> If a TT get a non-local task, it will query the nodes which have the data >> and finish this task, you can also decide to keep those fetched data on this >> TT or not by configuring the Hadoop mapred-site.xml file. >> >> BTW, even TT get a data local task, it may also ask other data owner (if you >> have more than one replica)for data to accelerate the process. (??? my >> understanding, any one can confirm) > > Not that I am aware of. The task's input location is used directly to > read the data. > > Thanks > Hemanth >> >> Hope this will help. >> >> Chen >> >> On Sun, Oct 31, 2010 at 9:49 PM, Zhenhua Guo <jen...@gmail.com> wrote: >> >>> Thanks! >>> One more question. Is the input file replicated on each node where a >>> mapper is run? Or just the portion processed by a mapper is >>> transferred? >>> >>> Gerald >>> >>> On Fri, Oct 29, 2010 at 10:11 AM, Harsh J <qwertyman...@gmail.com> wrote: >>> > Hello, >>> > >>> > On Fri, Oct 29, 2010 at 12:45 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>> >> TaskTracker will tell JobTracker how many free slots it has through >>> >> heartbeat. And JobTracker will choose the best tasktracker with the >>> >> consideration of data locality. >>> > >>> > Yes. To add some more, a scheduler is responsible to do assignments of >>> > tasks (based on various stats, including data locality) to proper >>> > tasktrackers. Scheduler.assignTasks(TaskTracker) is used to assign a >>> > TaskTracker its tasks, and the scheduler type is configurable (Some >>> > examples are Eager/FIFO scheduler, Capacity scheduler, etc.). >>> > >>> > This scheduling is done when a heart beat response is to be sent back >>> > to a TaskTracker that called JobTracker.heartbeat(...). >>> > >>> >> >>> >> >>> >> On Thu, Oct 28, 2010 at 2:52 PM, Zhenhua Guo <jen...@gmail.com> wrote: >>> >>> Hi, all >>> >>> I wonder how Hadoop schedules mappers and reducers (e.g. consider >>> >>> load balancing, affinity to data?). For example, how to decide on >>> >>> which nodes mappers and reducers are to be executed and when. >>> >>> Thanks! >>> >>> >>> >>> Gerald >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards >>> >> >>> >> Jeff Zhang >>> >> >>> > >>> > >>> > >>> > -- >>> > Harsh J >>> > www.harshj.com >>> > >>> >> >