Hi all The parallelism of queries executed in given SparkContext can be controlled via spark.default.parallelism
I have a scenario where need to run multiple concurrent queries in a single context, but so that to ensure concurrent queries shall be able to utilize the resources without resource starvation. If need to control parallelism such that, each query has its own upper cap of resource usage. I do not see a solution as currently system supports context level resource assignment, which means the resources allocated will be shared between the queries which are executing in the same context. For Example: My SparkContext has started over a cluster of 100 cores, with two concurrent queries Query1 and Query2. Now Query1 to be restricted to use 5 cores and Query 2 to be restricted to 10 cores As I am very new to spark development, a simple solution I see is maintain a ( executionId vs coresUsed ) in TaskSchedulerImpl and hence controlling it @ resourceOfferSingleTaskSet but not sure it will be a good idea. ( seeing it may have adverse effect over parallelize, makeRDD etc) Any suggestions.? or please correct me if there is a problem in my use case. Regards Ajith