I would suggest that if those applications really are not possible with
Slurm - then reserve a set of nodes for interactive use and disable the
Slurm daemon on them.
Direct users to those nodes.
More constructively - maybe the list can help you get the X11 applications
to run using Slurm.
Could yo
Hello Michael,
Thank you for your email and apologies for my tardy response. I'm still sorting
out my mailbox after an Easter break. I've taken your comments on board and
I'll see how I go with your suggestions.
Best regards,
David
From: slurm-users on behalf o
Hi everyone,
We have a single node with 8 gpus. Users often pile up lots of pending jobs and
are using all 8 at the same time, but for a user who just wants to do a short
run debug job and needs one of the gpus, they are having to wait too long for a
gpu to free up. Is there a way with gres.con
We put a ‘gpu’ QOS on all our GPU partitions, and limit jobs per user to 8 (our
GPU capacity) via MaxJobsPerUser. Extra jobs get blocked, allowing other users
to queue jobs ahead of the extras.
# sacctmgr show qos gpu format=name,maxjobspu
Name MaxJobsPU
-- -
gpu
Chris
Upon further testing this morning I see the job is assigned two different
jobid's, something I wasn't expecting. This lead me down the road of thinking
the output was incorrect.
Scontrol on a hetro job will show multi-jobids for the job. So, the output just
wasn't what I was expecting.
Here's how we handle this here:
Create a separate partition named debug that also contains that node.
Give the debug partition a very short timelimit, say 30 - 60 minutes.
Long enough for debugging, but too short to do any real work. Make the
priority of the debug partition much higher than t