When I use spark and hdfs on two different clusters. How does spark workers know that which block of data is available in which hdfs node. Who basically caters to this.
Can someone throw light on this.
When I use spark and hdfs on two different clusters. How does spark workers know that which block of data is available in which hdfs node. Who basically caters to this.
Can someone throw light on this.