To me at least the simplest solution would be to create 3 partitions.
The first is for the cpu only nodes, the second is the gpu nodes and the
third is a lower priority requeue partition. This is how we do it here.
This way the requeue partition can be used to grab the cpu's on the gpu
nodes wi
What I have done is setup partition QOSes for nodes with 4 GPUs and 64
cores
sacctmgr add qos lcncpu-part
sacctmgr modify qos lcncpu-part set priority=20 \
flags=DenyOnLimit MaxTRESPerNode=cpu=32,gres/gpu=0
sacctmgr add qos lcngpu-part
sacctmgr modify qos lcn-part set priority=20 \
flag
Hi Davide
Thanks for your feedback
If gpu01 and cpusingpu01 are physically the same node, doesn't this mean
that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and
one with "slurmd -N cpusingpu01") ?
Thanks, Massimo
On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento
wrote:
>
Hello Megan,
this looks like a solution, thank you!
The reason I asked for an option
that can be set once for only one QoS (that should be preempted by all other
OoSes)
is that I use Ansible for managing my users, and I have a YAML file with all
users data. I was hoping to avoid adding an optio