[slurm-users] Re: Max TRES per user and node

Carsten Beyer via slurm-users Wed, 25 Sep 2024 04:28:44 -0700

Hi Guillaume,

as Rob it already mentioned, this could maybe a way for you (partitionjust created temporarily online for testing). You could also add yourMaxTRES=node=1 for more restrictions. We do something similar with QOSto restrict the number of CPU's for user in certain partitions.



sacctmgr create qos name=maxtrespu200G maxtrespu=mem=200G flags=denyonlimit

scontrol create partition=testtres qos=maxtrespu200g maxtime=08:00:00nodes=lt[10000-10003] DefMemPerCPU=940 MaxMemPerCPU=940 OverSubscribe=NO



That results in:


4 jobs with 100G each:

---
[root@levantetest ~]# squeue

JOBID PARTITION NAME USER ST TIME NODESNODELIST(REASON) 862 testtres hostname xxxxxxx PD 0:00 1(QOSMaxMemoryPerUser) 861 testtres hostname xxxxxxx PD 0:00 1(QOSMaxMemoryPerUser)

               860  testtres hostname  xxxxxxx  R       0:15 1 lt10000
               859  testtres hostname  xxxxxxx  R       0:22 1 lt10000


6 jobs with 50G each:

---
[k202068@levantetest ~]$ squeue

JOBID PARTITION NAME USER ST TIME NODESNODELIST(REASON) 876 testtres hostname xxxxxxx PD 0:00 1(QOSMaxMemoryPerUser) 875 testtres hostname xxxxxxx PD 0:00 1(QOSMaxMemoryPerUser)

               874  testtres hostname  xxxxxxx  R       9:09 1 lt10000
               873  testtres hostname  xxxxxxx  R       9:15 1 lt10000
               872  testtres hostname  xxxxxxx  R       9:22 1 lt10000
               871  testtres hostname  xxxxxxx  R       9:26 1 lt10000

Best Regrads,
Carsten


--
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone:  +49 40 460094-221
Fax:    +49 40 460094-270
Email:be...@dkrz.de
URL:http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784




Am 24.09.24 um 16:58 schrieb Guillaume COCHARD via slurm-users:

> "So if they submit a 2^nd job, that job can start but will have togo onto another node, and will again be restricted to 200G? So theycan start as many jobs as there are nodes, and each job will berestricted to using 1 node and 200G of memory?"
Yes that's it. We already have MaxNodes=1 so a job can't be spread onmultiple nodes.
To be more precise, the limit should be by user and not by job. Toillustrate, let's imagine we have 3 empty nodes and a 200G/user/nodelimit. If a user submit 10 jobs each requesting 100G of memory, thereshould be 2 jobs running on each worker and 4 jobs pending.
Guillaume

------------------------------------------------------------------------
*De: *"Groner, Rob" <rug...@psu.edu>
*À: *"Guillaume COCHARD" <guillaume.coch...@cc.in2p3.fr>
*Cc: *slurm-users@lists.schedmd.com
*Envoyé: *Mardi 24 Septembre 2024 16:37:34
*Objet: *Re: Max TRES per user and node
Ah, sorry, I didn't catch that from your first post (though you didsay it).
So, you are trying to limit the user to no more than 200G of memory ona single node? So if they submit a 2^nd job, that job can start butwill have to go onto another node, and will again be restricted to200G? So they can start as many jobs as there are nodes, and each jobwill be restricted to using 1 node and 200G of memory? Or can theysubmit a job asking for 4 nodes, where they are limited to 200G oneach node? Or are they limited to a single node, no matter how many jobs?
Rob

------------------------------------------------------------------------
*From:* Guillaume COCHARD <guillaume.coch...@cc.in2p3.fr>
*Sent:* Tuesday, September 24, 2024 10:09 AM
*To:* Groner, Rob <rug...@psu.edu>
*Cc:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
*Subject:* Re: Max TRES per user and node
Thank you for your answer.

To test it I tried:
sacctmgr update qos normal set maxtresperuser=cpu=2
# Then in slurm.conf
PartitionName=test […] qos=normal
But then if I submit several 1-cpu jobs only two start and the othersstay pending, even though I have several nodes available. So it seemsthat MaxTRESPerUser is a QoS-wide limit, and doesn't limit TRES peruser and per node but rather per user and QoS (or rather partitionsince I applied the QoS on the partition). Did I miss something?
Thanks again,
Guillaume

------------------------------------------------------------------------
*De: *"Groner, Rob" <rug...@psu.edu>
*À: *slurm-users@lists.schedmd.com, "Guillaume COCHARD"<guillaume.coch...@cc.in2p3.fr>
*Envoyé: *Mardi 24 Septembre 2024 15:45:08
*Objet: *Re: Max TRES per user and node

You have the right idea.

On that same page, you'll find MaxTRESPerUser, as a QOS parameter.
You can create a QOS with the restrictions you'd like, and then in thepartition definition, you give it that QOS. The QOS will then applyits restrictions to any jobs that use that partition.
Rob
------------------------------------------------------------------------
*From:* Guillaume COCHARD via slurm-users <slurm-users@lists.schedmd.com>
*Sent:* Tuesday, September 24, 2024 9:30 AM
*To:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
*Subject:* [slurm-users] Max TRES per user and node
Hello,
We are looking for a method to limit the TRES used by each user on aper-node basis. For example, we would like to limit the total memoryallocation of jobs from a user to 200G per node.
There is MaxTRESperNode(https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsacctmgr.html%23OPT_MaxTRESPerNode&data=05%7C02%7Crug262%40psu.edu%7Ca5ac74d119fb4b1e2a6a08dcdc9d71f4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638627815993703402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ovXl4if01XtEDBQy3GxOG%2BrpH1GiDYFEOjNtz7gpkUs%3D&reserved=0<https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode>), butunfortunately, this is a per-job limit, not per user.
Ideally, we would like to apply this limit on partitions and/or QoS.Does anyone know if this is possible and how to achieve it?
Thank you,

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Max TRES per user and node

Reply via email to