Le 04/04/2024 à 03:33, Loris Bennett via slurm-users a écrit :
I have never really understood the approach of having different
partitions for different lengths of job, but it seems to be quite
widespread, so I assume there are valid use cases.
However, for our around 450 users, of which about 200 will submit at
least one job in a given month, we have an alternative approach without
pre-emption where we essentially have just a single partition. Users
can then specify a QOS which will increase priority at the cost of
accepting a lower cap on number of jobs/resources/maximum runtime:
$ sqos
Name Priority MaxWall MaxJobs MaxSubmit MaxTRESPU
---------- ---------- ----------- ------- --------- --------------------
hiprio 100000 03:00:00 50 100 cpu=128,gres/gpu=4
prio 1000 3-00:00:00 500 1000 cpu=256,gres/gpu=8
standard 0 14-00:00:00 2000 10000 cpu=768,gres/gpu=16
where
alias sqos='sacctmgr show qos
format=name,priority,maxwall,maxjobs,maxsubmitjobs,maxtrespu%20'
/usr/bin/sacctmgr
The standard cap on the resources corresponds to about 1/7 of our cores.
The downside is that very occasionally nodes may idle because a user has
reached his or her cap. However, we have usually have enough uncapped
users submitting jobs, so that in fact this happens only rarely, such as
sometimes at Christmas or New Year.
Cheers,
Loris
Hi Loris, Tomas
I'm new too in using slurm shceduler.
In your configuration, you have to define a DefaultQOS for each User, or
Association, right? You don't defina DefaultQOS at partition nivel..
Thank's!
--
-- Jérôme
L'amour, c'est comme le potage : les premières cuillerées sont trop chaudes,
les dernières sont trop froides.
(Jeanne Moreau)
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com