yes, spark attempts to achieve data locality (PROCESS_LOCAL or NODE_LOCAL)
where possible just like MapReduce.  it's a best practice to co-locate your
Spark Workers on the same nodes as your HDFS Name Nodes for just this
reason.

this is achieved through the RDD.preferredLocations() interface method:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD

on a related note, you can configure spark.locality.wait as the number of
millis to wait before falling back to a less-local data node (RACK_LOCAL):
  http://spark.apache.org/docs/latest/configuration.html

-chris


On Fri, Jun 13, 2014 at 11:06 PM, [email protected] <
[email protected]> wrote:

> Hi All
>
> Is there any communication between Spark MASTER node and Hadoop NameNode
> while distributing work to WORKER nodes, like we have in MapReduce.
>
> Please suggest
>
> TIA
>
> --
> Anish Sneh
> "Experience is the best teacher."
> http://in.linkedin.com/in/anishsneh
>
>
>  ------------------------------
> * From: * [email protected] <[email protected]>;
> * To: * [email protected] <[email protected]>;
>
> * Subject: * How Spark Choose Worker Nodes for respective HDFS block
> * Sent: * Fri, Jun 13, 2014 9:17:50 PM
>
>   Hi All
>
> I am new to Spark, workin on 3 node test cluster. I am trying to explore
> Spark scope in analytics, my Spark codes interacts with HDFS mostly.
>
> I have a confusion that how Spark choose on which node it will distribute
> its work.
>
> Since we assume that it can be an alternative to Hadoop MapReduce. In
> MapReduce we know that internally framework will distribute code (or logic)
> to the nearest TaskTracker which are co-located with DataNode or in same
> rack or probably nearest to the data blocks.
>
> My confusion is when I give HDFS path inside a Spark program how it choose
> which node is nearest (if it does).
>
> If it does not then how it will work when I have TBs of data where high
> network latency will be involved.
>
> My apologies for asking basic question, please suggest.
>
> TIA
> --
> Anish Sneh
> "Experience is the best teacher."
> http://www.anishsneh.com
>

Reply via email to