Hi Miguel, I finally found the time to test the QOS NoDecay configuration vs GrpTRESMins account limit.
Here is my benchmark : 1) Initialize the benchmark configuration - reset all RawUsage (on QOS and account) - set a limit on Account GrpTRESMins - run several jobs with a controlled ellaps cpu time on a QOS. - reset account RawUsage - set a limit on Account GrpTRESMins under the QOS RawUsage Here is the inital state before running the benchmark toto@login1: ~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,GrpTRESMins,rawusage Account User GrpTRESRaw GrpTRESMins RawUsage -------------------- ---------- ----------------------------------------------------- ------------------------------ ----------- dci cpu=0 ,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0 cpu=4100 0 Account RawUsage = 0 GrpTRESMins cpu=4100 toto@login1 :~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support QOS=support(8) UsageRaw=253632 .000000 GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(132.10) GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESMins=cpu=N(4227) ,mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh= MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0) MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0) MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)} User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0) MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)} QOS support RawUsage = 253632 s or 4227 mn QOS support RawUsage > GrpTRESMins SLURM should prevent to start a job for this account if it works as expected. 2) Run the benchmark to control limit GrpTRESMins efficiency over QOS rawusage toto @login1:~/TEST$ sbatch TRESMIN.slurm Submitted batch job 3687 toto@login1:~/TEST$ squeue JOBIDADMIN_COMMMIN_MEMOR SUBMIT_TIME PRIORITY PARTITION QOS USER STATE TIME_LIMIT TIME NODES REASON START_TIME 3687 BDW28 60000M 2022-06-30T19:36:42 1100000 bdw28 support toto RUNNING 5:00 0:02 1 None 2022-06-30T19:36:42 The job is running unless GrpTRESMins is under QOS support RawUsage . Is there anything wrong with my control process that invalidates the result ? Thanks Gérard [ http://www.cines.fr/ ] > De: "gerard gil" <gerard....@cines.fr> > À: "Slurm-users" <slurm-users@lists.schedmd.com> > Envoyé: Mercredi 29 Juin 2022 19:13:56 > Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage > Hi Miguel, >>If I understood you correctly your goal was to limit the number of minutes >>each >>project can run. By associating each project to a slurm account with a nodecay > >QoS then you will have achieved your goal. > Here is what I what to do : > "All jobs submitted to an account regardless the QOS they use have to be > constrained to a number of minutes set by the limit associated with that > account (and not to QOS)." > >Try a project with a very small limit and you will see that it won’t run > I already tested GrpTRESmins limit and confirms it works as expected. > Then I saw the decay effect on GrpTRESRaw (what I thought first as the right > metric to look at) and try to find out a way to fix it. > It's really very import for me to trust it, so I need a deterministic test to > prove it. > I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all > RawUsage (Account and QOS) to be sure it works as I expect. > I print the account GrpTRESRaw (in mn) at the end of my tests job to set a new > limits with GrpTRESMins and see how it behaves. > I'll get inform on the results. I hope it works. > > You don’t have to add anything. >>Each QoS will accumulate its respective usage, i.e, the usage of all users on >>that account. Users can even be on different accounts (projects) and charge >>the > >respective project with the parameter --account on sbatch. > If SLURM does it for to manage limit I would also like to obtain the current > RawUsage for an account. > Do you know how to get it ? > >The GrpTRESMins is always changed on the QoS with a command like: > >sacctmgr update qos where qos=... set GrpTRESMin=cpu=…. > That's right if you want to set a limit to a QOS. > But I dont know/think the same limit value will also apply to all other QOS, > and > if I apply the same limit to all QOS. > Is my account limit the sum of all the QOS limit ? > Actualy I'm setting the limit to the Account using command: > sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=... > With this setting I saw the limit is set to the account and not to the QOS. > sacctmgr show QOS command shows an empty field for GrpTRESMins on all QOS > Thanks again form your help. > I hope I'm close to get the answer to my issue. > Best, > Gérard > [ http://www.cines.fr/ ] >> De: "Miguel Oliveira" <miguel.olive...@uc.pt> >> À: "Slurm-users" <slurm-users@lists.schedmd.com> >> Envoyé: Mercredi 29 Juin 2022 01:28:58 >> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage >> Hi Gérard, >> If I understood you correctly your goal was to limit the number of minutes >> each >> project can run. By associating each project to a slurm account with a >> nodecay >> QoS then you will have achieved your goal. >> Try a project with a very small limit and you will see that it won’t run. >> You don’t have to add anything. Each QoS will accumulate its respective >> usage, >> i.e, the usage of all users on that account. Users can even be on different >> accounts (projects) and charge the respective project with the parameter >> --account on sbatch. >> The GrpTRESMins is always changed on the QoS with a command like: >> sacctmgr update qos where qos=... set GrpTRESMin=cpu=…. >> Hope that makes sense! >> Best, >> MAO >>> On 28 Jun 2022, at 18:30, [ mailto:gerard....@cines.fr | >>> gerard....@cines.fr ] >>> wrote: >>> Hi Miguel, >>> OK, I did'nt know this command. >>> I'm not sure to understand how it works regarding to my goal. >>> I use the following command inspired by the command you gave me and I >>> obtain a >>> UsageRaw for each QOS. >>> scontrol -o show assoc_mgr -accounts=myaccount Users=" " >>> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount with >>> NoDecay ? >>> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle to >>> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is >>> reach >>> ? >>> Thanks again for your precious help. >>> Gérard >>> [ http://www.cines.fr/ ] >>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | >>>> miguel.olive...@uc.pt ] >>>> > >>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com | >>>> slurm-users@lists.schedmd.com ] > >>>> Envoyé: Mardi 28 Juin 2022 17:23:18 >>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage >>>> Hi Gérard, >>>> The way you are checking is against the association and as such it ought >>>> to be >>>> decreasing in order to be used by fair share appropriately. >>>> The counter used that does not decrease is on the QoS, not the >>>> association. You >>>> can check that with: >>>> scontrol -o show assoc_mgr | grep "^QOS='+account+’ ” >>>> That ought to give you two numbers. The first is the limit, or N for not >>>> limit, >>>> and the second in parenthesis the usage. >>>> Hope that helps. >>>> Best, >>>> Miguel Afonso Oliveira >>>>> On 28 Jun 2022, at 08:58, [ mailto:gerard....@cines.fr | >>>>> gerard....@cines.fr ] >>>>> wrote: >>>>> Hi Miguel, >>>>> I modified my test configuration to evaluate the effect of NoDecay. >>>>> I modified all QOS adding NoDecay Flag. >>>>> toto@login1:~/TEST$ sacctmgr show QOS >>>>> Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags >>>>> UsageThres >>>>> UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall >>>>> MaxTRES >>>>> MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU >>>>> MaxTRESPA >>>>> MaxJobsPA MaxSubmitPA MinTRES >>>>> ---------- ---------- ---------- ---------- ------------------- >>>>> ----------- >>>>> ---------------------------------------- ---------- ----------- >>>>> ------------- >>>>> ------------- ------------- ------- --------- ----------- ------------- >>>>> -------------- ------------- ----------- ------------- --------- >>>>> ----------- >>>>> ------------- --------- ----------- ------------- >>>>> normal 0 00:00:00 cluster NoDecay 1.000000 >>>>> interactif 10 00:00:00 cluster NoDecay 1.000000 node=50 node=22 1-00:00:00 >>>>> node=50 >>>>> petit 4 00:00:00 cluster NoDecay 1.000000 node=1500 node=22 1-00:00:00 >>>>> node=300 >>>>> gros 6 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 >>>>> node=700 >>>>> court 8 00:00:00 cluster NoDecay 1.000000 node=1100 node=100 02:00:00 >>>>> node=300 >>>>> long 4 00:00:00 cluster NoDecay 1.000000 node=500 node=200 5-00:00:00 >>>>> node=200 >>>>> special 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=2106 >>>>> 5-00:00:00 >>>>> node=2106 >>>>> support 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 >>>>> node=2106 >>>>> visu 10 00:00:00 cluster NoDecay 1.000000 node=4 node=700 06:00:00 node=4 >>>>> I submitted a bunch of jobs to control the NoDecay efficiency and I >>>>> noticed >>>>> RawUsage as well as GrpTRESRaw cpu is still decreasing. >>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80, >>>>> GrpTRESMins ,RawUsage >>>>> Account User GrpTRESRaw GrpTRESMins RawUsage >>>>> -------------------- ---------- >>>>> ----------------------------------------------------- >>>>> ------------------------------ ----------- >>>>> dci cpu=6932 >>>>> ,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 >>>>> cpu=17150 >>>>> 415966 >>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80, >>>>> GrpTRESMins , RawUsage >>>>> Account User GrpTRESRaw GrpTRESMins RawUsage >>>>> -------------------- ---------- >>>>> ----------------------------------------------------- >>>>> ------------------------------ ----------- >>>>> dci cpu=6931 >>>>> ,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 >>>>> cpu=17150 >>>>> 415866 >>>>> toto@login1:~/TEST$ sshare -A dci -u " " -o >>>>> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage >>>>> Account User GrpTRESRaw GrpTRESMins RawUsage >>>>> -------------------- ---------- >>>>> ----------------------------------------------------- >>>>> ------------------------------ ----------- >>>>> dci cpu=6929 >>>>> ,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 >>>>> cpu=17150 >>>>> 415766 >>>>> Something I forgot to do ? >>>>> Best, >>>>> Gérard >>>>> Cordialement, >>>>> Gérard Gil >>>>> Département Calcul Intensif >>>>> Centre Informatique National de l'Enseignement Superieur >>>>> 950, rue de Saint Priest >>>>> 34097 Montpellier CEDEX 5 >>>>> FRANCE >>>>> tel : (334) 67 14 14 14 >>>>> fax : (334) 67 52 37 63 >>>>> web : [ http://www.cines.fr/ | http://www.cines.fr ] >>>>>> De: "Gérard Gil" < [ mailto:gerard....@cines.fr | gerard....@cines.fr ] > >>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com | >>>>>> slurm-users@lists.schedmd.com ] > >>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | >>>>>> slurm-us...@schedmd.com ] >>>>>> > >>>>>> Envoyé: Vendredi 24 Juin 2022 14:52:12 >>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage >>>>>> Hi Miguel, >>>>>> Good !! >>>>>> I'll try this options on all existing QOS and see if everything works as >>>>>> expected. >>>>>> I'll inform you on the results. >>>>>> Thanks a lot >>>>>> Best, >>>>>> Gérard >>>>>> ----- Mail original ----- >>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.olive...@uc.pt | >>>>>>> miguel.olive...@uc.pt ] >>>>>>> > >>>>>>> À: "Slurm-users" < [ mailto:slurm-users@lists.schedmd.com | >>>>>>> slurm-users@lists.schedmd.com ] > >>>>>>> Cc: "slurm-users" < [ mailto:slurm-us...@schedmd.com | >>>>>>> slurm-us...@schedmd.com ] >>>>>>> > >>>>>>> Envoyé: Vendredi 24 Juin 2022 14:07:16 >>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage >>>>>>> Hi Gérard, >>>>>>> I believe so. All our accounts correspond to one project and all have an >>>>>>> associated QoS with NoDecay and DenyOnLimit. This is enough to restrict >>>>>>> usage >>>>>>> on each individual project. >>>>>>> You only need these flags on the QoS. The association will carry on as >>>>>>> usual and >>>>>>> fairshare will not be impacted. >>>>>>> Hope that helps, >>>>>>> Miguel Oliveira >>>>>>>> On 24 Jun 2022, at 12:56, [ mailto:gerard....@cines.fr | >>>>>>>> gerard....@cines.fr ] >>>>>>>> wrote: >>>>>>>> Hi Miguel, >>>>>>>>> Why not? You can have multiple QoSs and you have other techniques to >>>>>>>>> change >>>>>>>>> priorities according to your policies. >>>>>>>> Is this answer my question ? >>>>>>>> "If all configured QOS use NoDecay, we can take advantage of the >>>>>>>> FairShare >>>>>>>> priority with Decay and all jobs GrpTRESRaw with NoDecay ?" >>>>>>>> Thanks >>>>>>>> Best, >>>>>> > > Gérard
smime.p7s
Description: S/MIME Cryptographic Signature