Spark SQL : How to make Spark support parallelism per sql

Ajith shetty Thu, 23 Nov 2017 09:07:53 -0800

Hi all

The parallelism of queries executed in given SparkContext can be controlled via 
spark.default.parallelism


I have a scenario where need to run multiple concurrent queries in a single 
context, but so that to ensure concurrent queries shall be able to utilize the 
resources without resource starvation.

If need to control parallelism such that, each query has its own upper cap of 
resource usage. I do not see a solution as currently system supports context 
level resource assignment, which means the resources allocated will be shared 
between the queries which are executing in the same context.

For Example: My SparkContext has started over a cluster of 100 cores, with two 
concurrent queries Query1 and Query2. Now Query1 to be restricted to use 5 
cores and Query 2 to be restricted to 10 cores

As I am very new to spark development, a simple solution I see is maintain a ( 
executionId vs coresUsed ) in TaskSchedulerImpl and hence controlling it @ 
resourceOfferSingleTaskSet but not sure it will be a good idea. ( seeing it may 
have adverse effect over parallelize, makeRDD etc)

Any suggestions.? or please correct me if there is a problem in my use case.

Regards
Ajith

Spark SQL : How to make Spark support parallelism per sql

Reply via email to