Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

gerard . gil Thu, 30 Jun 2022 11:15:37 -0700

Hi Miguel, 

I finally found the time to test the QOS NoDecay configuration vs GrpTRESMins 
account limit.


Here is my benchmark : 

1) Initialize the benchmark configuration 
- reset all RawUsage (on QOS and account) 
- set a limit on Account GrpTRESMins 
- run several jobs with a controlled ellaps cpu time on a QOS. 
- reset account RawUsage 
- set a limit on Account GrpTRESMins under the QOS RawUsage 

Here is the inital state before running the benchmark 

toto@login1: ~/TEST$ sshare -A dci -u " " -o 
account,user,GrpTRESRaw%80,GrpTRESMins,rawusage 
Account User GrpTRESRaw GrpTRESMins RawUsage 
-------------------- ---------- 
----------------------------------------------------- 
------------------------------ ----------- 
dci cpu=0 ,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0 cpu=4100 0 

Account RawUsage = 0 
GrpTRESMins cpu=4100 

toto@login1 :~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support 
QOS=support(8) UsageRaw=253632 .000000 GrpJobs=N(0) GrpJobsAccrue=N(0) 
GrpSubmitJobs=N(0) GrpWall=N(132.10) 
GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
 GrpTRESMins=cpu=N(4227) 
,mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0)
 
GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)
 MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh= 
MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0) 
MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0) 
MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
 User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0) 
MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
 

QOS support RawUsage = 253632 s or 4227 mn 

QOS support RawUsage > GrpTRESMins SLURM should prevent to start a job for this 
account if it works as expected. 

2) Run the benchmark to control limit GrpTRESMins efficiency over QOS rawusage 

toto @login1:~/TEST$ sbatch TRESMIN.slurm 
Submitted batch job 3687 

toto@login1:~/TEST$ squeue 
JOBIDADMIN_COMMMIN_MEMOR SUBMIT_TIME PRIORITY PARTITION QOS USER STATE 
TIME_LIMIT TIME NODES REASON START_TIME 
3687 BDW28 60000M 2022-06-30T19:36:42 1100000 bdw28 support toto RUNNING 5:00 
0:02 1 None 2022-06-30T19:36:42 

The job is running unless GrpTRESMins is under QOS support RawUsage . 

Is there anything wrong with my control process that invalidates the result ? 

Thanks 

Gérard 

[ http://www.cines.fr/ ] 

> De: "gerard gil" <gerard....@cines.fr>
> À: "Slurm-users" <slurm-users@lists.schedmd.com>
> Envoyé: Mercredi 29 Juin 2022 19:13:56
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

> Hi Miguel,

>>If I understood you correctly your goal was to limit the number of minutes 
>>each
>>project can run. By associating each project to a slurm account with a nodecay
> >QoS then you will have achieved your goal.

> Here is what I what to do :

> "All jobs submitted to an account regardless the QOS they use have to be
> constrained to a number of minutes set by the limit associated with that
> account (and not to QOS)."

> >Try a project with a very small limit and you will see that it won’t run

> I already tested GrpTRESmins limit and confirms it works as expected.
> Then I saw the decay effect on GrpTRESRaw (what I thought first as the right
> metric to look at) and try to find out a way to fix it.

> It's really very import for me to trust it, so I need a deterministic test to
> prove it.

> I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all
> RawUsage (Account and QOS) to be sure it works as I expect.
> I print the account GrpTRESRaw (in mn) at the end of my tests job to set a new
> limits with GrpTRESMins and see how it behaves.

> I'll get inform on the results. I hope it works.

> > You don’t have to add anything.
>>Each QoS will accumulate its respective usage, i.e, the usage of all users on
>>that account. Users can even be on different accounts (projects) and charge 
>>the
> >respective project with the parameter --account on sbatch.

> If SLURM does it for to manage limit I would also like to obtain the current
> RawUsage for an account.
> Do you know how to get it ?

> >The GrpTRESMins is always changed on the QoS with a command like:

> >sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

> That's right if you want to set a limit to a QOS.
> But I dont know/think the same limit value will also apply to all other QOS, 
> and
> if I apply the same limit to all QOS.
> Is my account limit the sum of all the QOS limit ?

> Actualy I'm setting the limit to the Account using command:

> sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=...

> With this setting I saw the limit is set to the account and not to the QOS.
> sacctmgr show QOS command shows an empty field for GrpTRESMins on all QOS

> Thanks again form your help.
> I hope I'm close to get the answer to my issue.

> Best,
> Gérard
> [ http://www.cines.fr/ ]

>> De: "Miguel Oliveira" <miguel.olive...@uc.pt>
>> À: "Slurm-users" <slurm-users@lists.schedmd.com>
>> Envoyé: Mercredi 29 Juin 2022 01:28:58
>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>> Hi Gérard,

>> If I understood you correctly your goal was to limit the number of minutes 
>> each
>> project can run. By associating each project to a slurm account with a 
>> nodecay
>> QoS then you will have achieved your goal.
>> Try a project with a very small limit and you will see that it won’t run.

>> You don’t have to add anything. Each QoS will accumulate its respective 
>> usage,
>> i.e, the usage of all users on that account. Users can even be on different
>> accounts (projects) and charge the respective project with the parameter
>> --account on sbatch.
>> The GrpTRESMins is always changed on the QoS with a command like:

>> sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

>> Hope that makes sense!

>> Best,

>> MAO

>>> On 28 Jun 2022, at 18:30, [ mailto:gerard....@cines.fr | 
>>> gerard....@cines.fr ]
>>> wrote:

>>> Hi Miguel,

>>> OK, I did'nt know this command.

>>> I'm not sure to understand how it works regarding to my goal.
>>> I use the following command inspired by the command you gave me and I 
>>> obtain a
>>> UsageRaw for each QOS.

>>> scontrol -o show assoc_mgr -accounts=myaccount Users=" "

>>> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount with
>>> NoDecay ?
>>> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle to
>>> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is 
>>> reach
>>> ?

>>> Thanks again for your precious help.

>>> Gérard
>>> [ http://www.cines.fr/ ]

>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | 
>>>> miguel.olive...@uc.pt ]
>>>> >
>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>> slurm-users@lists.schedmd.com ] >
>>>> Envoyé: Mardi 28 Juin 2022 17:23:18
>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>> Hi Gérard,

>>>> The way you are checking is against the association and as such it ought 
>>>> to be
>>>> decreasing in order to be used by fair share appropriately.
>>>> The counter used that does not decrease is on the QoS, not the 
>>>> association. You
>>>> can check that with:

>>>> scontrol -o show assoc_mgr | grep "^QOS='+account+’ ”

>>>> That ought to give you two numbers. The first is the limit, or N for not 
>>>> limit,
>>>> and the second in parenthesis the usage.

>>>> Hope that helps.

>>>> Best,

>>>> Miguel Afonso Oliveira

>>>>> On 28 Jun 2022, at 08:58, [ mailto:gerard....@cines.fr | 
>>>>> gerard....@cines.fr ]
>>>>> wrote:

>>>>> Hi Miguel,

>>>>> I modified my test configuration to evaluate the effect of NoDecay.

>>>>> I modified all QOS adding NoDecay Flag.

>>>>> toto@login1:~/TEST$ sacctmgr show QOS
>>>>> Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags 
>>>>> UsageThres
>>>>> UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall 
>>>>> MaxTRES
>>>>> MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU 
>>>>> MaxTRESPA
>>>>> MaxJobsPA MaxSubmitPA MinTRES
>>>>> ---------- ---------- ---------- ---------- ------------------- 
>>>>> -----------
>>>>> ---------------------------------------- ---------- ----------- 
>>>>> -------------
>>>>> ------------- ------------- ------- --------- ----------- -------------
>>>>> -------------- ------------- ----------- ------------- --------- 
>>>>> -----------
>>>>> ------------- --------- ----------- -------------
>>>>> normal 0 00:00:00 cluster NoDecay 1.000000
>>>>> interactif 10 00:00:00 cluster NoDecay 1.000000 node=50 node=22 1-00:00:00
>>>>> node=50
>>>>> petit 4 00:00:00 cluster NoDecay 1.000000 node=1500 node=22 1-00:00:00 
>>>>> node=300
>>>>> gros 6 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 
>>>>> node=700
>>>>> court 8 00:00:00 cluster NoDecay 1.000000 node=1100 node=100 02:00:00 
>>>>> node=300
>>>>> long 4 00:00:00 cluster NoDecay 1.000000 node=500 node=200 5-00:00:00 
>>>>> node=200
>>>>> special 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=2106 
>>>>> 5-00:00:00
>>>>> node=2106
>>>>> support 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00
>>>>> node=2106
>>>>> visu 10 00:00:00 cluster NoDecay 1.000000 node=4 node=700 06:00:00 node=4

>>>>> I submitted a bunch of jobs to control the NoDecay efficiency and I 
>>>>> noticed
>>>>> RawUsage as well as GrpTRESRaw cpu is still decreasing.

>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins ,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6932
>>>>> ,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 
>>>>> cpu=17150
>>>>> 415966
>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins , RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6931
>>>>> ,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 
>>>>> cpu=17150
>>>>> 415866
>>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o
>>>>> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6929
>>>>> ,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 
>>>>> cpu=17150
>>>>> 415766

>>>>> Something I forgot to do ?

>>>>> Best,
>>>>> Gérard

>>>>> Cordialement,
>>>>> Gérard Gil

>>>>> Département Calcul Intensif
>>>>> Centre Informatique National de l'Enseignement Superieur
>>>>> 950, rue de Saint Priest
>>>>> 34097 Montpellier CEDEX 5
>>>>> FRANCE

>>>>> tel : (334) 67 14 14 14
>>>>> fax : (334) 67 52 37 63
>>>>> web : [ http://www.cines.fr/ | http://www.cines.fr ]

>>>>>> De: "Gérard Gil" < [ mailto:gerard....@cines.fr | gerard....@cines.fr ] >
>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>>> slurm-users@lists.schedmd.com ] >
>>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | 
>>>>>> slurm-us...@schedmd.com ]
>>>>>> >
>>>>>> Envoyé: Vendredi 24 Juin 2022 14:52:12
>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>>>> Hi Miguel,

>>>>>> Good !!

>>>>>> I'll try this options on all existing QOS and see if everything works as
>>>>>> expected.
>>>>>> I'll inform you on the results.

>>>>>> Thanks a lot

>>>>>> Best,
>>>>>> Gérard

>>>>>> ----- Mail original -----

>>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | 
>>>>>>> miguel.olive...@uc.pt ]
>>>>>>> >
>>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com |
>>>>>>> slurm-users@lists.schedmd.com ] >
>>>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | 
>>>>>>> slurm-us...@schedmd.com ]
>>>>>>> >
>>>>>>> Envoyé: Vendredi 24 Juin 2022 14:07:16
>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>>>>> Hi Gérard,

>>>>>>> I believe so. All our accounts correspond to one project and all have an
>>>>>>> associated QoS with NoDecay and DenyOnLimit. This is enough to restrict 
>>>>>>> usage
>>>>>>> on each individual project.
>>>>>>> You only need these flags on the QoS. The association will carry on as 
>>>>>>> usual and
>>>>>>> fairshare will not be impacted.

>>>>>>> Hope that helps,

>>>>>>> Miguel Oliveira

>>>>>>>> On 24 Jun 2022, at 12:56, [ mailto:gerard....@cines.fr | 
>>>>>>>> gerard....@cines.fr ]
>>>>>>>> wrote:

>>>>>>>> Hi Miguel,

>>>>>>>>> Why not? You can have multiple QoSs and you have other techniques to 
>>>>>>>>> change
>>>>>>>>> priorities according to your policies.
>>>>>>>> Is this answer my question ?

>>>>>>>> "If all configured QOS use NoDecay, we can take advantage of the 
>>>>>>>> FairShare
>>>>>>>> priority with Decay and all jobs GrpTRESRaw with NoDecay ?"

>>>>>>>> Thanks

>>>>>>>> Best,
>>>>>> > > Gérard

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

Reply via email to