Hi, I was wanting to pull data from about 1500 remote Oracle tables with Spark, and I want to have a multi-threaded application that picks up a table per thread or maybe 10 tables per thread and launches a spark job to read from their respective tables.
I read official spark site *https://spark.apache.org/docs/latest/job-scheduling.html <https://spark.apache.org/docs/latest/job-scheduling.html> *I can see "...cluster managers that Spark runs on provide facilities for scheduling across applications <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications>. Second, *within* each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application> to schedule resources within each SparkContext." Also you might have noticed in this SO post *https://stackoverflow.com/questions/30862956/concurrent-job-execution-in-spark <https://stackoverflow.com/questions/30862956/concurrent-job-execution-in-spark> *that there was no accepted answer on this similar question and the most upvoted answer starts with "This is not really in the spirit of Spark", and that is A. Everyone knows it's not in the spirit of Spark and B. Who cares what is the spirit of Spark, that doesn't actually mean anything. Has anyone gotten something like this to work before? Did you have to do anything special? I was thinking of sending out a message to the dev group too because maybe the person that actually wrote the website can give a little more color to the above statement. Thanks, Mike