Hi All, A YARN cluster with 352 Nodes (10TB, 3000cores) and has Fair Scheduler with root queue having 230 queues.
Each Queue is configured with maxResources equal to Total Cluster Resource. When a Spark job is submitted into a queue A, it is given with 10TB, 3000 cores according to instantaneous Fair Share and it is holding the entire resource without releasing. After some time, when another job is submitted into other queue B, it will get the Fair Share 45GB and 13 cores i.e (10TB,3000 cores)/230 using Preemption. Now if some more jobs are submitted into queue B, all the jobs in B has to share the 45GB and 13 cores. Whereas the job which is in queue A holds the entire cluster resource affecting the other jobs. This kind of issue often happens when a Spark job submitted first which holds the entire cluster resource. What is the best way to fix this issue. Can we make preemption to happen for instantaneous fair share instead of fair share, will it help. Note: 1. We do not want to give weight for particular queue. Because all the 240 queues are critical. 2. Changing the queues into nested does not solve the issue. 3. Adding maxResource to queue won't allow the first job to pick entire cluster resource, but still configuring the optimal maxResource for 230 queue is difficult and also the first job can't use the entire cluster resource when the cluster is idle. 4. We do not want to handle it in Spark ApplicationMaster, then we need to check for other new YARN application type with similar behavior. We want YARN to control this behavior by killing the resources which is hold by first job for longer period. Thanks, Prabhu Joseph