Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

Reuti Wed, 02 Nov 2016 17:08:14 -0700

Am 02.11.2016 um 21:47 schrieb Joshua Baker-LePain:

> On Wed, 2 Nov 2016 at 11:13am, Reuti wrote
> 
>>> Am 02.11.2016 um 18:36 schrieb Joshua Baker-LePain <j...@salilab.org>:
>>> 
>>> On our cluster, we have three queues per host, each with as many slots as 
>>> the host has physical cores.  The queues are configured as follows:
>>> 
>>> o lab.q (high priority queue for cluster "owners")
>>>  - load_thresholds       np_load_avg=1.5
>>> o short.q (for jobs <30 minutes)
>>>  - load_thresholds       np_load_avg=1.25
>>> o long.q (low priority queue avaialble to all users)
>>>  - load_thresholds       np_load_avg=0.9
>>> 
>>> The theory is that we want long.q to stop accepting jobs when a node is 
>>> fully loaded (read: load = physical core count) and short.q to stop 
>>> accepting jobs when when a node is 50% overloaded.  This has worked well 
>>> for a long while.
>> 
>> As the load is just the number of eligible processes in the run queue*, it 
>> should for sure get at least up to the number of available cores. Did you 
>> increase the number of slots for these machines too (also PEs)? What is 
>> `uptime` showing? What happens with the reported load, when you run some 
>> jobs in the background outside of SGE on these nodes?


Just for the record: to investigate this, I defined a load_thresholds which is 
always putting the queue in alarm state besides the one under test. I used our 
tmpfree complex for it and entered a value which is beyond the installed disk. 
This way, `qstat -explain a` will always give an output, even the values of 
other complexes which aren't bypassed are displayed. I got:

$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.75    lx24-em64t    a
        alarm hl:tmpfree=1842222120k load-threshold=2T
        alarm hl:np_load_avg=0.492188 load-threshold=0.5

$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.75    lx24-em64t    a
        alarm hl:tmpfree=1842222120k load-threshold=2T
        alarm hl:np_load_avg=   9.844 load-threshold=0.5

$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.76    lx24-em64t    a
        alarm hl:tmpfree=1842221988k load-threshold=2T
        alarm hl:np_load_avg=   0.246 load-threshold=0.5

for settings of NONE or 20 and 0.5 in the load_scaling of np_load_avg of the 
exechost. Looks fine. Hence your np_load_avg=2 should have worked.

-- Reuti


> I don't think I was entirely clear above.  We still consider a fully loaded 
> node to be one using as many slots as there are *physical* cores. So each 
> queue is defined to have as many slots as there are physical cores.  Our 
> goals with the queues are this:
> 
> 1) If a node is running full load of lab.q jobs, long.q should go into
>   alarm and not accept any jobs.
> 
> 2) That same fully loaded node should accept jobs in short.q until it is
>   50% overloaded, at which time short.q should also go into alarm.
> 
> 3) Conversely, if a node is running a full load of long.q jobs, it should
>   still accept a full load of lab.q jobs.
> 
> As an example, here's a non-hyperthreaded node:
> 
> $ qhost -q -h iq116
> iq116                   linux-x64       8  9.93   15.6G    4.0G    4.0G  
> 196.3M
>   lab.q                BP    0/8/8
>   short.q              BP    0/2/8
>   long.q               BP    0/0/8         a
> 
> lab.q is full and short.q is still accepting jobs, but long.q is in alarm, as 
> intended.  Here's a hyperthreaded node:
> 
> $ qhost -q -h msg-id1
> HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  
> MEMUSE  SWAPTO  SWAPUS
> ----------------------------------------------------------------------------------------------
> global                  -               -    -    -    -     -       -       
> -       -       -
> msg-id1                 lx-amd64       48    2   24   48 24.52  251.6G    
> 2.2G    4.0G     0.0
>   lab.q                BP    0/24/24
>   short.q              BP    0/0/24
>   long.q               BP    0/0/24
> 
> So even though lab.q is full, long.q isn't in alarm.  Here's how that node 
> shows up in qconf:
> 
> $ qconf -se msg-id1
> hostname              msg-id1.ic.ucsf.edu
> load_scaling          np_load_avg=2.000000
> complex_values        mem_free=256000M
> load_values           arch=lx-amd64,num_proc=48,mem_total=257673.273438M, \
>                      swap_total=4095.996094M,virtual_total=261769.269531M, \
>                      
> m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT,
>  \
>                      m_socket=2,m_core=24,m_thread=48,load_avg=24.520000, \
>                      load_short=24.490000,load_medium=24.520000, \
>                      load_long=24.500000,mem_free=255421.792969M, \
>                      swap_free=4095.996094M,virtual_free=259517.789062M, \
>                      mem_used=2251.480469M,swap_used=0.000000M, \
>                      virtual_used=2251.480469M,cpu=50.000000, \
>                      
> m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT,
>  \
>                      np_load_avg=0.510833,np_load_short=0.510208, \
>                      np_load_medium=0.510833,np_load_long=0.510417
> processors            48
> 
> Given I have both hyperthreaded and non-hyperthreaded nodes, I can't just 
> change the value of the queue's np_load_avg load_threshold.  I thought 
> load_scaling was the answer, but it's not having any effect that I can see.
> 
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

Reply via email to