Sandy, I experienced the similar behavior as Koert just mentioned. I don't understand why there is a difference between using spark-submit and programmatic execution. Maybe there is something else we need to add to the spark conf/spark context in order to launch spark jobs programmatically that are not needed before?
On Wed, Jul 9, 2014 at 12:14 PM, Koert Kuipers <ko...@tresata.com> wrote: > sandy, that makes sense. however i had trouble doing programmatic > execution on yarn in client mode as well. the application-master in yarn > came up but then bombed because it was looking for jars that dont exist (it > was looking in the original file paths on the driver side, which are not > available on the yarn node). my guess is that spark-submit is changing some > settings (perhaps preparing the distributed cache and modifying settings > accordingly), which makes it harder to run things programmatically. i could > be wrong however. i gave up debugging and resorted to using spark-submit > for now. > > > > On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> Spark still supports the ability to submit jobs programmatically without >> shell scripts. >> >> Koert, >> The main reason that the unification can't be a part of SparkContext is >> that YARN and standalone support deploy modes where the driver runs in a >> managed process on the cluster. In this case, the SparkContext is created >> on a remote node well after the application is launched. >> >> >> On Wed, Jul 9, 2014 at 8:34 AM, Andrei <faithlessfri...@gmail.com> wrote: >> >>> One another +1. For me it's a question of embedding. With >>> SparkConf/SparkContext I can easily create larger projects with Spark as a >>> separate service (just like MySQL and JDBC, for example). With spark-submit >>> I'm bound to Spark as a main framework that defines how my application >>> should look like. In my humble opinion, using Spark as embeddable library >>> rather than main framework and runtime is much easier. >>> >>> >>> >>> >>> On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> +1 as well for being able to submit jobs programmatically without using >>>> shell script. >>>> >>>> we also experience issues of submitting jobs programmatically without >>>> using spark-submit. In fact, even in the Hadoop World, I rarely used >>>> "hadoop jar" to submit jobs in shell. >>>> >>>> >>>> >>>> On Wed, Jul 9, 2014 at 9:47 AM, Robert James <srobertja...@gmail.com> >>>> wrote: >>>> >>>>> +1 to be able to do anything via SparkConf/SparkContext. Our app >>>>> worked fine in Spark 0.9, but, after several days of wrestling with >>>>> uber jars and spark-submit, and so far failing to get Spark 1.0 >>>>> working, we'd like to go back to doing it ourself with SparkConf. >>>>> >>>>> As the previous poster said, a few scripts should be able to give us >>>>> the classpath and any other params we need, and be a lot more >>>>> transparent and debuggable. >>>>> >>>>> On 7/9/14, Surendranauth Hiraman <suren.hira...@velos.io> wrote: >>>>> > Are there any gaps beyond convenience and code/config separation in >>>>> using >>>>> > spark-submit versus SparkConf/SparkContext if you are willing to set >>>>> your >>>>> > own config? >>>>> > >>>>> > If there are any gaps, +1 on having parity within >>>>> SparkConf/SparkContext >>>>> > where possible. In my use case, we launch our jobs programmatically. >>>>> In >>>>> > theory, we could shell out to spark-submit but it's not the best >>>>> option for >>>>> > us. >>>>> > >>>>> > So far, we are only using Standalone Cluster mode, so I'm not >>>>> knowledgeable >>>>> > on the complexities of other modes, though. >>>>> > >>>>> > -Suren >>>>> > >>>>> > >>>>> > >>>>> > On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <ko...@tresata.com> >>>>> wrote: >>>>> > >>>>> >> not sure I understand why unifying how you submit app for different >>>>> >> platforms and dynamic configuration cannot be part of SparkConf and >>>>> >> SparkContext? >>>>> >> >>>>> >> for classpath a simple script similar to "hadoop classpath" that >>>>> shows >>>>> >> what needs to be added should be sufficient. >>>>> >> >>>>> >> on spark standalone I can launch a program just fine with just >>>>> SparkConf >>>>> >> and SparkContext. not on yarn, so the spark-launch script must be >>>>> doing a >>>>> >> few things extra there I am missing... which makes things more >>>>> difficult >>>>> >> because I am not sure its realistic to expect every application that >>>>> >> needs >>>>> >> to run something on spark to be launched using spark-submit. >>>>> >> On Jul 9, 2014 3:45 AM, "Patrick Wendell" <pwend...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >>> It fulfills a few different functions. The main one is giving >>>>> users a >>>>> >>> way to inject Spark as a runtime dependency separately from their >>>>> >>> program and make sure they get exactly the right version of Spark. >>>>> So >>>>> >>> a user can bundle an application and then use spark-submit to send >>>>> it >>>>> >>> to different types of clusters (or using different versions of >>>>> Spark). >>>>> >>> >>>>> >>> It also unifies the way you bundle and submit an app for Yarn, >>>>> Mesos, >>>>> >>> etc... this was something that became very fragmented over time >>>>> before >>>>> >>> this was added. >>>>> >>> >>>>> >>> Another feature is allowing users to set configuration values >>>>> >>> dynamically rather than compile them inside of their program. >>>>> That's >>>>> >>> the one you mention here. You can choose to use this feature or >>>>> not. >>>>> >>> If you know your configs are not going to change, then you don't >>>>> need >>>>> >>> to set them with spark-submit. >>>>> >>> >>>>> >>> >>>>> >>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James < >>>>> srobertja...@gmail.com> >>>>> >>> wrote: >>>>> >>> > What is the purpose of spark-submit? Does it do anything outside >>>>> of >>>>> >>> > the standard val conf = new SparkConf ... val sc = new >>>>> SparkContext >>>>> >>> > ... ? >>>>> >>> >>>>> >> >>>>> > >>>>> > >>>>> > -- >>>>> > >>>>> > SUREN HIRAMAN, VP TECHNOLOGY >>>>> > Velos >>>>> > Accelerating Machine Learning >>>>> > >>>>> > 440 NINTH AVENUE, 11TH FLOOR >>>>> > NEW YORK, NY 10001 >>>>> > O: (917) 525-2466 ext. 105 >>>>> > F: 646.349.4063 >>>>> > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io >>>>> > W: www.velos.io >>>>> > >>>>> >>>> >>>> >>> >> >