Sorry if this is the wrong list, i am looking for deep technical/hadoop source help :)
How does job scheduling work on yarn framework for map reduce jobs? I see the yarn scheduler discussed here: https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html which leads me to believe tasks are scheduled based on node capacity and not data locality. I've sifted through the fair scheduler and can't find anything about data location or locality. Where does data locality play into the scheduling of map/reduce tasks on yarn? Can someone point me to the hadoop 2.x source where the data block location is used to calculate node/container/task assignment (if thats still happening). -bc