Thanks Andrew. We cannot include Spark in our Java project due to dependency issues. The Spark will not be exposed to clients. What we want todo is to put spark tarball (in worst case) into HDFS, so through our java app which runs in local mode, launch spark-submit script with user python files. So the input spark-submit will be only python scripts of users. What do i need to set up, to call spark-submit directly from HDFS ? What is the reason that spark-submit cannot be runfrom HDFS directly if spark tarball is in HDFS ? My intention is to launch spark-submit script through the Hadoop map job. Hadoop map job will run from our app, but it will launch Spark job in Yarn cluster through running spark-submit script. Thanks.
On Fri, Jun 19, 2015, 6:58 PM Andrew Or <and...@databricks.com> wrote: > Hi Elkhan, > > Spark submit depends on several things: the launcher jar (1.3.0+ only), > the spark-core jar, and the spark-yarn jar (in your case). Why do you want > to put it in HDFS though? AFAIK you can't execute scripts directly from > HDFS; you need to copy them to a local file system first. I don't see clear > benefits of not just running Spark submit from source or from one of the > distributions. > > -Andrew > > 2015-06-19 10:12 GMT-07:00 Elkhan Dadashov <elkhan8...@gmail.com>: > >> Hi all, >> >> If I want to ship spark-submit script to HDFS. and then call it from HDFS >> location for starting Spark job, which other files/folders/jars need to be >> transferred into HDFS with spark-submit script ? >> >> Due to some dependency issues, we can include Spark in our Java >> application, so instead we will allow limited usage of Spark only with >> Python files. >> >> So if I want to put spark-submit script into HDFS, and call it to execute >> Spark job in Yarn cluster, what else need to be put into HDFS with it ? >> >> (Using Spark only for execution Spark jobs written in Python) >> >> Thanks. >> >> >