Hi,

> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain <j...@salilab.org>:
> 
> On our cluster, we have three queues per host, each with as many slots as the 
> host has physical cores.  The queues are configured as follows:
> 
> o lab.q (high priority queue for cluster "owners")
>   - load_thresholds       np_load_avg=1.5
> o short.q (for jobs <30 minutes)
>   - load_thresholds       np_load_avg=1.25
> o long.q (low priority queue avaialble to all users)
>   - load_thresholds       np_load_avg=0.9
> 
> The theory is that we want long.q to stop accepting jobs when a node is fully 
> loaded (read: load = physical core count) and short.q to stop accepting jobs 
> when when a node is 50% overloaded.  This has worked well for a long while.
> 
> On nodes that support it (and not all of ours do), we leave hyperthreading on 
> as it is a net win on those nodes.  As core counts have increased, though, a 
> problem has become blindingly obvious -- the above scheme doesn't work 
> anymore.  long.q never goes into alarm mode since the load doesn't hit the 
> NCPU reported by SGE.  This is true on both OGS 2011.11p1 and SoGE 8.1.9.

As the load is just the number of eligible processes in the run queue*, it 
should for sure get at least up to the number of available cores. Did you 
increase the number of slots for these machines too (also PEs)? What is 
`uptime` showing? What happens with the reported load, when you run some jobs 
in the background outside of SGE on these nodes?

-- Reuti

*) Nowadays including uninterruptible kernel tasks.


> I thought I could fix this using load_scaling on the exec hosts with 
> hyperthreading, but I can't get it to work.  I try to define "load_avg=2" 
> and/or "np_load_avg=2", but none of these configurations seem to have any 
> effect.  What am I doing wrong?
> 
> Thanks.
> 
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to