yes, spark attempts to achieve data locality (PROCESS_LOCAL or NODE_LOCAL)
where possible just like MapReduce. it's a best practice to co-locate your
Spark Workers on the same nodes as your HDFS Name Nodes for just this
reason.
this is achieved through the RDD.preferredLocations() interface metho
Hi All
Is there any communication between Spark MASTER node and Hadoop NameNode while
distributing work to WORKER nodes, like we have in MapReduce.
Please suggest
TIA
--
Anish Sneh
"Experience is the best teacher."
http://in.linkedin.com/in/anishsneh