Hi, > Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain <j...@salilab.org>: > > On our cluster, we have three queues per host, each with as many slots as the > host has physical cores. The queues are configured as follows: > > o lab.q (high priority queue for cluster "owners") > - load_thresholds np_load_avg=1.5 > o short.q (for jobs <30 minutes) > - load_thresholds np_load_avg=1.25 > o long.q (low priority queue avaialble to all users) > - load_thresholds np_load_avg=0.9 > > The theory is that we want long.q to stop accepting jobs when a node is fully > loaded (read: load = physical core count) and short.q to stop accepting jobs > when when a node is 50% overloaded. This has worked well for a long while. > > On nodes that support it (and not all of ours do), we leave hyperthreading on > as it is a net win on those nodes. As core counts have increased, though, a > problem has become blindingly obvious -- the above scheme doesn't work > anymore. long.q never goes into alarm mode since the load doesn't hit the > NCPU reported by SGE. This is true on both OGS 2011.11p1 and SoGE 8.1.9.
As the load is just the number of eligible processes in the run queue*, it should for sure get at least up to the number of available cores. Did you increase the number of slots for these machines too (also PEs)? What is `uptime` showing? What happens with the reported load, when you run some jobs in the background outside of SGE on these nodes? -- Reuti *) Nowadays including uninterruptible kernel tasks. > I thought I could fix this using load_scaling on the exec hosts with > hyperthreading, but I can't get it to work. I try to define "load_avg=2" > and/or "np_load_avg=2", but none of these configurations seem to have any > effect. What am I doing wrong? > > Thanks. > > -- > Joshua Baker-LePain > QB3 Shared Cluster Sysadmin > UCSF > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users