How does spark-submit handle Python scripts (and how to repeat it)?

Andrei Mon, 11 Apr 2016 12:48:06 -0700

I'm working on a wrapper [1] around Spark for the Julia programming
language [2] similar to PySpark. I've got it working with Spark Standalone
server by creating local JVM and setting master programmatically. However,
this approach doesn't work with YARN (and probably Mesos), which require
running via `spark-submit`.


In `SparkSubmit` class I see that for Python a special class `PythonRunner`
is launched, so I tried to do similar `JuliaRunner`, which essentially does
the following:

val pb = new ProcessBuilder(Seq("julia", juliaScript))
val process = pb.start()
process.waitFor()


where `juliaScript` itself creates new JVM and `SparkContext` inside it
WITHOUT setting master URL. I then tried to launch this class using

spark-submit --master yarn \
                      --class o.a.s.a.j.JuliaRunner \
                      project.jar my_script.jl

I expected that `spark-submit` would set environment variables or something
that SparkContext would then read and connect to appropriate master. This
didn't happen, however, and process failed while trying to instantiate
`SparkContext`, saying that master is not specified.

So what am I missing? How can use `spark-submit` to run driver in a non-JVM
language?


[1]: https://github.com/dfdx/Sparta.jl
[2]: http://julialang.org/

How does spark-submit handle Python scripts (and how to repeat it)?

Reply via email to