Hi developers, I've encountered some problem with Spark, and before opening an
issue, I'd like to hear your thoughts.
Currently, if you want to submit a Spark job, you'll need to write the code,
make a jar, and then submit it with spark-submit or
org.apache.spark.launcher.SparkLauncher.
But sometimes, the RDD operation chain is transferred dynamically in code, from
SQL or even GUI. thus it seems either inconvenient or not possible to make a
separated jar. Then I tried something like below:
val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")val sc =
new SparkContext(conf)sc.textFile("README.md").flatMap(_.split(" ")).map((_,
1)).reduceByKey(_+_).foreach(println) // A simple word countWhen they are
executed, a Spark job is submitted. However, there are some remaining problems:
1. It doesn't support all deploy modes, such as yarn-cluster.
2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice.
3. It runs within the same process with my code, no child process is created.
Thus, what I wish for is that the problems can be handle by Spark itself, and
my request can be simply described as a "adding submit() method for
SparkContext / StreamingContext / SQLContext". I hope if I added a line after
the code above like this:
sc.submit()then Spark can handle all background submitting processing for me.
I already opened an issue before for this demand, but I couldn't make myself
clear back then. So I wrote this email and try to talk to you guys. Please
reply if you need further descriptions, and I'll open a issue for this if you
understand my demand and believe that it's something worth doing.
Thanks a lot.
Yuhang Chen.
[email protected]