This currently only works for YARN.  The standalone default is to place an
executor on every node for every job.

The total number of executors is specified by the user.

-Sandy


On Fri, Jul 18, 2014 at 2:00 AM, Haopu Wang <hw...@qilinsoft.com> wrote:

>    Sandy,
>
>
>
> Do you mean the “preferred location” is working for standalone cluster
> also? Because I check the code of SparkContext and see comments as below:
>
>
>
>   // This is used only by YARN for now, but should be relevant to other
> cluster types (*Mesos*,
>
>   // etc) too. This is typically generated from
> InputFormatInfo.computePreferredLocations. It
>
>   // contains a map from *hostname* to a list of input format splits on
> the host.
>
>   *private*[spark] *var* preferredNodeLocationData: Map[String,
> Set[SplitInfo]] = Map()
>
>
>
> BTW, even with the preferred hosts, how does Spark decide how many total
> executors to use for this application?
>
>
>
> Thanks again!
>
>
>  ------------------------------
>
> *From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
> *Sent:* Friday, July 18, 2014 3:44 PM
> *To:* user@spark.apache.org
> *Subject:* Re: data locality
>
>
>
> Hi Haopu,
>
>
>
> Spark will ask HDFS for file block locations and try to assign tasks based
> on these.
>
>
>
> There is a snag.  Spark schedules its tasks inside of "executor" processes
> that stick around for the lifetime of a Spark application.  Spark requests
> executors before it runs any jobs, i.e. before it has any information about
> where the input data for the jobs is located.  If the executors occupy
> significantly fewer nodes than exist in the cluster, it can be difficult
> for Spark to achieve data locality.  The workaround for this is an API that
> allows passing in a set of preferred locations when instantiating a Spark
> context.  This API is currently broken in Spark 1.0, and will likely
> changed to be something a little simpler in a future release.
>
>
>
> val locData = InputFormatInfo.computePreferredLocations
>
>   (Seq(new InputFormatInfo(conf, classOf[TextInputFormat], new
> Path(“myfile.txt”)))
>
>
>
> val sc = new SparkContext(conf, locData)
>
>
>
> -Sandy
>
>
>
>
>
> On Fri, Jul 18, 2014 at 12:35 AM, Haopu Wang <hw...@qilinsoft.com> wrote:
>
> I have a standalone spark cluster and a HDFS cluster which share some of
> nodes.
>
>
>
> When reading HDFS file, how does spark assign tasks to nodes? Will it ask
> HDFS the location for each file block in order to get a right worker node?
>
>
>
> How about a spark cluster on Yarn?
>
>
>
> Thank you very much!
>
>
>
>
>

Reply via email to