[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Edmon via slurm-users
To me at least the simplest solution would be to create 3 partitions. The first is for the cpu only nodes, the second is the gpu nodes and the third is a lower priority requeue partition. This is how we do it here. This way the requeue partition can be used to grab the cpu's on the gpu nodes wi

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Raines via slurm-users
What I have done is setup partition QOSes for nodes with 4 GPUs and 64 cores sacctmgr add qos lcncpu-part sacctmgr modify qos lcncpu-part set priority=20 \ flags=DenyOnLimit MaxTRESPerNode=cpu=32,gres/gpu=0 sacctmgr add qos lcngpu-part sacctmgr modify qos lcn-part set priority=20 \ flag

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Massimo Sgaravatto via slurm-users
Hi Davide Thanks for your feedback If gpu01 and cpusingpu01 are physically the same node, doesn't this mean that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and one with "slurmd -N cpusingpu01") ? Thanks, Massimo On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento wrote: >

[slurm-users] Re: Preemption question

2025-03-31 Thread Kamil Wilczek via slurm-users
Hello Megan, this looks like a solution, thank you! The reason I asked for an option that can be set once for only one QoS (that should be preempted by all other OoSes) is that I use Ansible for managing my users, and I have a YAML file with all users data. I was hoping to avoid adding an optio