Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain: > On Wed, 2 Nov 2016 at 11:13am, Reuti wrote > >>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain <j...@salilab.org>: >>> >>> On our cluster, we have three queues per host, each with as many slots as >>> the host has physical cores. The queues are configured as follows: >>> >>> o lab.q (high priority queue for cluster "owners") >>> - load_thresholds np_load_avg=1.5 >>> o short.q (for jobs <30 minutes) >>> - load_thresholds np_load_avg=1.25 >>> o long.q (low priority queue avaialble to all users) >>> - load_thresholds np_load_avg=0.9 >>> >>> The theory is that we want long.q to stop accepting jobs when a node is >>> fully loaded (read: load = physical core count) and short.q to stop >>> accepting jobs when when a node is 50% overloaded. This has worked well >>> for a long while. >> >> As the load is just the number of eligible processes in the run queue*, it >> should for sure get at least up to the number of available cores. Did you >> increase the number of slots for these machines too (also PEs)? What is >> `uptime` showing? What happens with the reported load, when you run some >> jobs in the background outside of SGE on these nodes?
Just for the record: to investigate this, I defined a load_thresholds which is always putting the queue in alarm state besides the one under test. I used our tmpfree complex for it and entered a value which is beyond the installed disk. This way, `qstat -explain a` will always give an output, even the values of other complexes which aren't bypassed are displayed. I got: $ qstat -explain a -q serial@node29 -s r queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- serial@node29 B 0/0/16 15.75 lx24-em64t a alarm hl:tmpfree=1842222120k load-threshold=2T alarm hl:np_load_avg=0.492188 load-threshold=0.5 $ qstat -explain a -q serial@node29 -s r queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- serial@node29 B 0/0/16 15.75 lx24-em64t a alarm hl:tmpfree=1842222120k load-threshold=2T alarm hl:np_load_avg= 9.844 load-threshold=0.5 $ qstat -explain a -q serial@node29 -s r queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- serial@node29 B 0/0/16 15.76 lx24-em64t a alarm hl:tmpfree=1842221988k load-threshold=2T alarm hl:np_load_avg= 0.246 load-threshold=0.5 for settings of NONE or 20 and 0.5 in the load_scaling of np_load_avg of the exechost. Looks fine. Hence your np_load_avg=2 should have worked. -- Reuti > I don't think I was entirely clear above. We still consider a fully loaded > node to be one using as many slots as there are *physical* cores. So each > queue is defined to have as many slots as there are physical cores. Our > goals with the queues are this: > > 1) If a node is running full load of lab.q jobs, long.q should go into > alarm and not accept any jobs. > > 2) That same fully loaded node should accept jobs in short.q until it is > 50% overloaded, at which time short.q should also go into alarm. > > 3) Conversely, if a node is running a full load of long.q jobs, it should > still accept a full load of lab.q jobs. > > As an example, here's a non-hyperthreaded node: > > $ qhost -q -h iq116 > iq116 linux-x64 8 9.93 15.6G 4.0G 4.0G > 196.3M > lab.q BP 0/8/8 > short.q BP 0/2/8 > long.q BP 0/0/8 a > > lab.q is full and short.q is still accepting jobs, but long.q is in alarm, as > intended. Here's a hyperthreaded node: > > $ qhost -q -h msg-id1 > HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT > MEMUSE SWAPTO SWAPUS > ---------------------------------------------------------------------------------------------- > global - - - - - - - > - - - > msg-id1 lx-amd64 48 2 24 48 24.52 251.6G > 2.2G 4.0G 0.0 > lab.q BP 0/24/24 > short.q BP 0/0/24 > long.q BP 0/0/24 > > So even though lab.q is full, long.q isn't in alarm. Here's how that node > shows up in qconf: > > $ qconf -se msg-id1 > hostname msg-id1.ic.ucsf.edu > load_scaling np_load_avg=2.000000 > complex_values mem_free=256000M > load_values arch=lx-amd64,num_proc=48,mem_total=257673.273438M, \ > swap_total=4095.996094M,virtual_total=261769.269531M, \ > > m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, > \ > m_socket=2,m_core=24,m_thread=48,load_avg=24.520000, \ > load_short=24.490000,load_medium=24.520000, \ > load_long=24.500000,mem_free=255421.792969M, \ > swap_free=4095.996094M,virtual_free=259517.789062M, \ > mem_used=2251.480469M,swap_used=0.000000M, \ > virtual_used=2251.480469M,cpu=50.000000, \ > > m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, > \ > np_load_avg=0.510833,np_load_short=0.510208, \ > np_load_medium=0.510833,np_load_long=0.510417 > processors 48 > > Given I have both hyperthreaded and non-hyperthreaded nodes, I can't just > change the value of the queue's np_load_avg load_threshold. I thought > load_scaling was the answer, but it's not having any effect that I can see. > > -- > Joshua Baker-LePain > QB3 Shared Cluster Sysadmin > UCSF _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users