Andrew, Thanks for your answer. It validates our finding. Unfortunately, client mode assumes that I'm running in a "privilege node". What I mean by privilege is a node that has net access to all the workers and vice versa. This is a big assumption to make and unreasonable in certain circumstances
I much rather have a single point of contact like a job server (like ooyala's) that handles jar uploading and lifecycles drivers. I think these are basic requirement for standalone clusters. Gino B. > On Jun 24, 2014, at 1:32 PM, Andrew Or <and...@databricks.com> wrote: > > Hi Randy and Gino, > > The issue is that standalone-cluster mode is not officially supported. Please > use standalone-client mode instead, i.e. specify --deploy-mode client in > spark-submit, or simply leave out this config because it defaults to client > mode. > > Unfortunately, this is not currently documented anywhere, and the existing > explanation for the distinction between cluster and client modes is highly > misleading. In general, cluster mode means the driver runs on one of the > worker nodes, just like the executors. The corollary is that the output of > the application is not forwarded to command that launched the application > (spark-submit in this case), but is accessible instead through the worker > logs. In contrast, client mode means the command that launches the > application also launches the driver, while the executors still run on the > worker nodes. This means the spark-submit command also returns the output of > the application. For instance, it doesn't make sense to run the Spark shell > in cluster mode, because the stdin / stdout / stderr will not be redirected > to the spark-submit command. > > If you are hosting your own cluster and can launch applications from within > the cluster, then there is little benefit for launching your application in > cluster mode, which is primarily intended to cut down the latency between the > driver and the executors in the first place. However, if you are still intent > on using standalone-cluster mode after all, you can use the deprecated way of > launching org.apache.spark.deploy.Client directly through bin/spark-class. > Note that this is not recommended and only serves as a temporary workaround > until we fix standalone-cluster mode through spark-submit. > > I have filed the relevant issues: > https://issues.apache.org/jira/browse/SPARK-2259 and > https://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this > out, and we will get to fixing these shortly. > > Best, > Andrew > > > 2014-06-20 6:06 GMT-07:00 Gino Bustelo <lbust...@gmail.com>: >> I've found that the jar will be copied to the worker from hdfs fine, but it >> is not added to the spark context for you. You have to know that the jar >> will end up in the driver's working dir, and so you just add a the file name >> if the jar to the context in your program. >> >> In your example below, just add to the context "test.jar". >> >> Btw, the context will not have the master URL either, so add that while you >> are at it. >> >> This is a big issue. I've posted about it a week ago and no replies. >> Hopefully it gets more attention as more people start hitting this. >> Basically, spark-submit on standalone cluster with cluster deploy is broken. >> >> Gino B. >> >> > On Jun 20, 2014, at 2:46 AM, randylu <randyl...@gmail.com> wrote: >> > >> > in addition, jar file can be copied to driver node automatically. >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-cluster-mode-of-spark-1-0-0-tp7982p7984.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >