yes, spark attempts to achieve data locality (PROCESS_LOCAL or NODE_LOCAL) where possible just like MapReduce. it's a best practice to co-locate your Spark Workers on the same nodes as your HDFS Name Nodes for just this reason.
this is achieved through the RDD.preferredLocations() interface method: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD on a related note, you can configure spark.locality.wait as the number of millis to wait before falling back to a less-local data node (RACK_LOCAL): http://spark.apache.org/docs/latest/configuration.html -chris On Fri, Jun 13, 2014 at 11:06 PM, [email protected] < [email protected]> wrote: > Hi All > > Is there any communication between Spark MASTER node and Hadoop NameNode > while distributing work to WORKER nodes, like we have in MapReduce. > > Please suggest > > TIA > > -- > Anish Sneh > "Experience is the best teacher." > http://in.linkedin.com/in/anishsneh > > > ------------------------------ > * From: * [email protected] <[email protected]>; > * To: * [email protected] <[email protected]>; > > * Subject: * How Spark Choose Worker Nodes for respective HDFS block > * Sent: * Fri, Jun 13, 2014 9:17:50 PM > > Hi All > > I am new to Spark, workin on 3 node test cluster. I am trying to explore > Spark scope in analytics, my Spark codes interacts with HDFS mostly. > > I have a confusion that how Spark choose on which node it will distribute > its work. > > Since we assume that it can be an alternative to Hadoop MapReduce. In > MapReduce we know that internally framework will distribute code (or logic) > to the nearest TaskTracker which are co-located with DataNode or in same > rack or probably nearest to the data blocks. > > My confusion is when I give HDFS path inside a Spark program how it choose > which node is nearest (if it does). > > If it does not then how it will work when I have TBs of data where high > network latency will be involved. > > My apologies for asking basic question, please suggest. > > TIA > -- > Anish Sneh > "Experience is the best teacher." > http://www.anishsneh.com >
