Spark will execute as a client for hdfs. In other words, it'll contact the hadoop master for the hdfs cluster, which will return the block info, and then the data will be fetched from the data nodes.
Date: Tue, 19 Apr 2016 14:00:31 +0530 Subject: Spark + HDFS From: chaturvedich...@gmail.com To: user@spark.apache.org When I use spark and hdfs on two different clusters.How does spark workers know that which block of data is available in which hdfs node. Who basically caters to this. Can someone throw light on this.