Thanks Michael for your answer. But Yarn of today does not manage HDFS. How does Yarn RM get to know HDFS blocks in each data node ?
Do you mean it is Yarn RM contacts NameNode for HDFS block data in each node, and then decided to launch executor on the nodes which has required input data blocks ? Either Yarn RM or Spak driver need to talk to NameNode which has HDFS blocks in each node data. I'm still curious which one talks to NameNode. Thanks. On Mon, Jul 13, 2015 at 12:38 PM, Michael Segel <michael_se...@hotmail.com> wrote: > I believe the short answer is that YARN is responsible for the scheduling > and will pick where the job runs. > > Look at it this way… you’re running a YARN job that runs spark. > > Yarn should run the job on A and E, however… if there aren’t enough free > resources, it will run the job elsewhere. > > > On Jul 13, 2015, at 10:10 AM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > > Hi folks, > > I have a question regarding scheduling of Spark job on Yarn cluster. > > Let's say there are 5 nodes on Yarn cluster: A,B,C, D, E > > In Spark job I'll be reading some huge text file (sc.textFile(fileName)) > from HDFS and create an RDD. > > Assume that only nodes A, E contain the blocks of that text file. > > I'm curious if Spark driver talks to NameNode or Yarn Resource Manager > talks to NameNode to know the nodes which has required input blocks ? > > How does Spark or Yarn launch executors on the nodes having required > blocks ? > > Which component gets that info (data blocks in each node) from NameNode ? > > This info is required while launching executors on WorkerNodes for using > data locality. > > How do Spark executors gets launched on nodes A,E ? But not B,C,D ? > > Or does Yarn RM launches executors on random nodes, and then if data block > does not exist in that node extra duplication/copy is one on HDFS ? (copy > data from A or E to one of {B,C,D} where executor got launched) > > Thanks. > > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > > -- Best regards, Elkhan Dadashov