Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
YARN-2026 has fixed the issue. On Thu, Feb 25, 2016 at 4:17 AM, Prabhu Joseph wrote: > You are right, Hamel. It should get 10 TB /2. And In hadoop-2.7.0, it is > working fine. But in hadoop-2.5.1, it gets only 10TB/230. The same > configuration used in both versions. > So i think a JIRA could ha

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
You are right, Hamel. It should get 10 TB /2. And In hadoop-2.7.0, it is working fine. But in hadoop-2.5.1, it gets only 10TB/230. The same configuration used in both versions. So i think a JIRA could have fixed the issue after hadoop-2.5.1. On Thu, Feb 25, 2016 at 1:28 AM, Hamel Kothari wrote:

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Hamel Kothari
The instantaneous fair share is what Queue B should get according to the code (and my experience). Assuming your queues are all equal it would be 10TB/2. I can't help much more unless I can see your config files and ideally also the YARN Scheduler UI to get an idea of what your queues/actual resou

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
Hi Hamel, Thanks for looking into the issue. What i am not understanding is, after preemption what is the share that the second queue gets in case if the first queue holds the entire cluster resource without releasing, is it instantaneous fair share or fair share. Queue A and B are there

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Hamel Kothari
If all queues are identical, this behavior should not be happening. Preemption as designed in fair scheduler (IIRC) takes place based on the instantaneous fair share, not the steady state fair share. The fair scheduler docs

Spark Job on YARN Hogging the entire Cluster resource

2016-02-23 Thread Prabhu Joseph
Hi All, A YARN cluster with 352 Nodes (10TB, 3000cores) and has Fair Scheduler with root queue having 230 queues. Each Queue is configured with maxResources equal to Total Cluster Resource. When a Spark job is submitted into a queue A, it is given with 10TB, 3000 cores according to instantan