Il 25/09/20 00:04, Relu Patrascu ha scritto:
> 1. Allow preemption in the same QOS, all else being equal, based on job
> priority.
You'd risk having jobs continuously preempted by jobs that have been in
queue for a bit: once a job starts, it stops accumulating priority ->
another job preempts the
Hi team,
i have extracted the %utilization report and found that the idle time is at
the higher end so wanted to check is there any way we can find the node
based utilization?
it will help us to figure out what are the nodes are unutilized.
REgards
navin.
Hi Jason,
We're taking the approach proposed in
https://bugs.schedmd.com/show_bug.cgi?id=7919: same RPM everywhere,
but without the dependencies that you don't want installed globally
(like NVML, PMIx...). Of course you need to satisfy those dependencies
some other way on the nodes that require th
Hey,
About once a day one or more Slurmd daemons running in our cluster
stop accepting new jobs, and they only recover when Slurmd is
restarted. The nodes are marked as "down", with the reason given as
"not responding". We are running version 20.02.0. Right at the time
this issue occurs the Slur
Hello all,
We're mostly a GPU compute shop, and we've been happy with slurm for the
last three years, but we think slurm would benefit from the following
two features:
1. Allow preemption in the same QOS, all else being equal, based on job
priority.
2. Job size calculation to take into acc
That's what we do here. We have three different rpms we build.
server: because we run the latest MariaDB on our master
general compute
gpu compute: because we build against nvml
We name these all the same but have them in different repos and
distribute the repos to each node appropriately.
Hello,
I hopefully have a quick question.
I have compiled Slurm RPMs on a CentOS system with nvidia drivers installed so
that I can utilize AutoDetect=nvml configuration in our GPU nodes’ gres.conf.
All seems to be going well on the GPU nodes since I have done that. I was
unable to install the
We have installed slurm 20.02.5 and I am trying to use this new reservation flag
MAGNETIC:
* https://slurm.schedmd.com/reservations.html
From this page I understand that the job will land in the reservation even if we
did not specify the
reservation name. I tested it on our cluster setup but i