Spark will execute as a client for hdfs. In other words, it'll contact the 
hadoop master for the hdfs cluster, which will return the block info, and then 
the data will be fetched from the data nodes.

Date: Tue, 19 Apr 2016 14:00:31 +0530
Subject: Spark + HDFS
From: chaturvedich...@gmail.com
To: user@spark.apache.org

When I use spark and hdfs on two different clusters.How does spark workers know 
that which block of data is available in which hdfs node. Who basically caters 
to this.
Can someone throw light on this.                                          
  • Spark + HDFS Chaturvedi Chola
    • RE: Spark + HDFS Ashic Mahtab

Reply via email to