Spark multithreaded job submission from driver

Michael Artz Thu, 14 Dec 2017 07:02:21 -0800

Hi,
   I was wanting to pull data from about 1500 remote Oracle tables with
Spark, and I want to have a multi-threaded application that  picks up a
table per thread or maybe 10 tables per thread and launches a spark job to
read from their respective tables.

I read official spark site
*https://spark.apache.org/docs/latest/job-scheduling.html
<https://spark.apache.org/docs/latest/job-scheduling.html> *I can see
"...cluster
managers that Spark runs on provide facilities for scheduling across
applications
<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications>.
Second, *within* each Spark application, multiple “jobs” (Spark actions)
may be running concurrently if they were submitted by different threads.
This is common if your application is serving requests over the network.
Spark includes a fair scheduler
<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>
to
schedule resources within each SparkContext."

Also you might have noticed in this SO post
*https://stackoverflow.com/questions/30862956/concurrent-job-execution-in-spark
<https://stackoverflow.com/questions/30862956/concurrent-job-execution-in-spark>
*that there was no accepted answer on this similar question and the most
upvoted answer starts with "This is not really in the spirit of Spark", and
that is A. Everyone knows it's not in the spirit of Spark and B. Who cares
what is the spirit of Spark, that doesn't actually mean anything.

Has anyone gotten something like this to work before? Did you have to do
anything special? I was thinking of sending out a message to the dev group
too because maybe the person that actually wrote the website can give a
little more color to the above statement.

Thanks, Mike

Spark multithreaded job submission from driver

Reply via email to