Sorry if this is the wrong list, i am looking for deep technical/hadoop source 
help :) 

How does job scheduling work on yarn framework for map reduce jobs?  I see the 
yarn scheduler discussed here: 
https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html  
which leads me to believe tasks are scheduled based on node capacity and not 
data locality.  I've sifted through the fair scheduler and can't find anything 
about data location or locality.

Where does data locality play into the scheduling of map/reduce tasks on yarn?  
Can someone point me to the hadoop 2.x source where the data block location is 
used to calculate node/container/task assignment (if thats still happening).



-bc

Reply via email to