I am not sure how the process works and if patches are applied to all upcoming versions of spark. Is it likely that the fix is available in this build (spark 1.6.0 17-Dec-2015 09:02)? http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/
Thanks! On Wed, Dec 16, 2015 at 9:22 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Since both scala and java files are involved in the PR, I don't see an > easy way around without building yourself. > > Cheers > > On Wed, Dec 16, 2015 at 10:18 AM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >> Exactly, but it's only fixed for the next spark version. Is there any >> work around for version 1.5.2? >> >> On Wed, Dec 16, 2015 at 4:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> This seems related: >>> [SPARK-10123][DEPLOY] Support specifying deploy mode from configuration >>> >>> FYI >>> >>> On Wed, Dec 16, 2015 at 7:31 AM, Saiph Kappa <saiph.ka...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have a client application running on host0 that is launching multiple >>>> drivers on multiple remote standalone spark clusters (each cluster is >>>> running on a single machine): >>>> >>>> « >>>> ... >>>> >>>> List("host1", "host2" , "host3").foreach(host => { >>>> >>>> val sparkConf = new SparkConf() >>>> sparkConf.setAppName("App") >>>> >>>> sparkConf.set("spark.driver.memory", "4g") >>>> sparkConf.set("spark.executor.memory", "4g") >>>> sparkConf.set("spark.driver.maxResultSize", "4g") >>>> sparkConf.set("spark.serializer", >>>> "org.apache.spark.serializer.KryoSerializer") >>>> sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops >>>> -XX:+UseConcMarkSweepGC " + >>>> "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ") >>>> >>>> sparkConf.setMaster(s"spark://$host:7077") >>>> >>>> val rawStreams = (1 to source.parallelism).map(_ => >>>> ssc.textFileStream("/home/user/data/")).toArray >>>> val rawStream = ssc.union(rawStreams) >>>> rawStream.count.map(c => s"Received $c records.").print() >>>> >>>> } >>>> ... >>>> >>>> » >>>> >>>> The problem is that I'm getting an error message saying that the directory >>>> "/home/user/data/" does not exist. >>>> In fact, this directory only exists in host1, host2 and host3 and not in >>>> host0. >>>> But since I'm launching the driver to host1..3 I thought data would be >>>> fetched from those machines. >>>> >>>> I'm also trying to avoid using the spark submit script, and couldn't find >>>> the configuration parameter to specify the deploy mode. >>>> >>>> Is there any way to specify the deploy mode through configuration >>>> parameter? >>>> >>>> >>>> Thanks. >>>> >>>> >>> >> >