The equivalent code in the Fair Scheduler is in AppSchedulable.java, under assignContainer(FSSchedulerNode node, boolean reserved).
YARN uses delay scheduling ( http://people.csail.mit.edu/matei/papers/2010/eurosys_delay_scheduling.pdf) for achieving data-locality. -Sandy On Thu, Apr 3, 2014 at 11:16 AM, Shekhar Gupta <shkhr...@gmail.com> wrote: > Hi Brad, > > YARN scheduling does take care of data locality. In YARN, tasks are not > assigned based on capacity. Actually certain number of containers are > allocated on every node based on node's capacity. Tasks are executed on > those containers. While scheduling tasks on containers YARN scheduler > satisfies data locality requirements. I am not very familiar with Fair > Scheduler but if you check the source of FifoScheduler you will find a > function 'assignContainersonNode' which looks like following > > private int assignContainersOnNode(FiCaSchedulerNode node, > FiCaSchedulerApp application, Priority priority > ) { > // Data-local > int nodeLocalContainers = > assignNodeLocalContainers(node, application, priority); > > // Rack-local > int rackLocalContainers = > assignRackLocalContainers(node, application, priority); > > // Off-switch > int offSwitchContainers = > assignOffSwitchContainers(node, application, priority); > > > LOG.debug("assignContainersOnNode:" + > " node=" + node.getRMNode().getNodeAddress() + > " application=" + application.getApplicationId().getId() + > " priority=" + priority.getPriority() + > " #assigned=" + > (nodeLocalContainers + rackLocalContainers + offSwitchContainers)); > > > return (nodeLocalContainers + rackLocalContainers + > offSwitchContainers); > } > > In this routine you will find that data-local tasks are scheduled first, > then rack-local and in then off-switch. > > After this you may find similar function in fairScheduler too. > > I hope this helps. Let me know if you more questions or if something is > wrong in my reasoning. > > Regards, > Shekhar > > > On Thu, Apr 3, 2014 at 10:56 AM, Brad Childs <b...@redhat.com> wrote: > > > Sorry if this is the wrong list, i am looking for deep technical/hadoop > > source help :) > > > > How does job scheduling work on yarn framework for map reduce jobs? I > see > > the yarn scheduler discussed here: > > > https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.htmlwhich > leads me to believe tasks are scheduled based on node capacity and > > not data locality. I've sifted through the fair scheduler and can't find > > anything about data location or locality. > > > > Where does data locality play into the scheduling of map/reduce tasks on > > yarn? Can someone point me to the hadoop 2.x source where the data block > > location is used to calculate node/container/task assignment (if thats > > still happening). > > > > > > > > -bc > > > > >