Re: problem about cluster mode of spark 1.0.0

Gino Bustelo Tue, 24 Jun 2014 17:09:25 -0700

Andrew,

Thanks for your answer. It validates our finding. Unfortunately, client mode 
assumes that I'm running in a "privilege node". What I mean by privilege is a 
node that has net access to all the workers and vice versa. This is a big 
assumption to make and unreasonable in certain circumstances


I much rather have a single point of contact like a job server (like ooyala's) 
that handles jar uploading and lifecycles drivers. I think these are basic 
requirement for standalone clusters. 

Gino B.

> On Jun 24, 2014, at 1:32 PM, Andrew Or <and...@databricks.com> wrote:
> 
> Hi Randy and Gino,
> 
> The issue is that standalone-cluster mode is not officially supported. Please 
> use standalone-client mode instead, i.e. specify --deploy-mode client in 
> spark-submit, or simply leave out this config because it defaults to client 
> mode.
> 
> Unfortunately, this is not currently documented anywhere, and the existing 
> explanation for the distinction between cluster and client modes is highly 
> misleading. In general, cluster mode means the driver runs on one of the 
> worker nodes, just like the executors. The corollary is that the output of 
> the application is not forwarded to command that launched the application 
> (spark-submit in this case), but is accessible instead through the worker 
> logs. In contrast, client mode means the command that launches the 
> application also launches the driver, while the executors still run on the 
> worker nodes. This means the spark-submit command also returns the output of 
> the application. For instance, it doesn't make sense to run the Spark shell 
> in cluster mode, because the stdin / stdout / stderr will not be redirected 
> to the spark-submit command.
> 
> If you are hosting your own cluster and can launch applications from within 
> the cluster, then there is little benefit for launching your application in 
> cluster mode, which is primarily intended to cut down the latency between the 
> driver and the executors in the first place. However, if you are still intent 
> on using standalone-cluster mode after all, you can use the deprecated way of 
> launching org.apache.spark.deploy.Client directly through bin/spark-class. 
> Note that this is not recommended and only serves as a temporary workaround 
> until we fix standalone-cluster mode through spark-submit.
> 
> I have filed the relevant issues: 
> https://issues.apache.org/jira/browse/SPARK-2259 and 
> https://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this 
> out, and we will get to fixing these shortly.
> 
> Best,
> Andrew
> 
> 
> 2014-06-20 6:06 GMT-07:00 Gino Bustelo <lbust...@gmail.com>:
>> I've found that the jar will be copied to the worker from hdfs fine, but it 
>> is not added to the spark context for you. You have to know that the jar 
>> will end up in the driver's working dir, and so you just add a the file name 
>> if the jar to the context in your program.
>> 
>> In your example below, just add to the context "test.jar".
>> 
>> Btw, the context will not have the master URL either, so add that while you 
>> are at it.
>> 
>> This is a big issue. I've posted about it a week ago and no replies. 
>> Hopefully it gets more attention as more people start hitting this. 
>> Basically, spark-submit on standalone cluster with cluster deploy is broken.
>> 
>> Gino B.
>> 
>> > On Jun 20, 2014, at 2:46 AM, randylu <randyl...@gmail.com> wrote:
>> >
>> > in addition, jar file can be copied to driver node automatically.
>> >
>> >
>> >
>> > --
>> > View this message in context: 
>> > http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-cluster-mode-of-spark-1-0-0-tp7982p7984.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: problem about cluster mode of spark 1.0.0

Reply via email to