Le jeudi 6 janvier 2022 à 22:39, David Henkemeyer
a écrit :
> All,
>
> When my team used PBS, we had several nodes that had a TON of CPUs, so many,
> in fact, that we ended up setting np to a smaller value, in order to not
> starve the system of memory.
>
> What is the best way to do this with
Hi David,
On 1/6/22 22:39, David Henkemeyer wrote:
When my team used PBS, we had several nodes that had a TON of CPUs, so
many, in fact, that we ended up setting np to a smaller value, in order to
not starve the system of memory.
What is the best way to do this with Slurm? I tried modifying
Hi Bas,
thank you very much for linking the bug report, the following solution
mentioned there helps me to solve the problem.
1. Stop slurmdbd
systemctl stop slurmdbd
We running SlurmDBD in Docker with Supervisord, so we have to stop it like
follows.
supervisorctl stop slurmdbd
2. Modify the
All,
When my team used PBS, we had several nodes that had a TON of CPUs, so
many, in fact, that we ended up setting np to a smaller value, in order to
not starve the system of memory.
What is the best way to do this with Slurm? I tried modifying # of CPUs in
the slurm.conf file, but I noticed th
Hi Martin,
My (quick and unrefined) thoughts about this:
This could only work if you don't have ConstrainDevices=yes in your
cgroup.conf. Which I don't think is a good idea, as jobs can use GPUs
allocated to other jobs.
Let's assume you don't use ConstrainDevices=yes:
The GPU's allocated to
Hello, I'm reviving a bit of old thread, but I just noticed I don't see
my January 2021 message in the archives, so I'm sending it again now
that the issue again got live on our side.
To quickly recap, we want to add permissions not only to /dev/nvidia*
devices based on the requested gres, bu
Hi Danny,
We had the same issue when we upgraded slurm to 20.11 but maybe the
solution also works for you:
* https://bugs.schedmd.com/show_bug.cgi?id=12947
On 06/01/2022 15:36, Danny Marc Rotscher wrote:
Hello everyone,
today we update our Slurm database daemon from 20.02.2 to 20.02.7 and
Hello everyone,
today we update our Slurm database daemon from 20.02.2 to 20.02.7 and
everything works except when I try to delete a user it does not work.
sacctmgr -i delete user name=xyz account=xyz
sacctmgr: slurmdbd: No error
Nothing deleted
slurmdbd.log:
slurmdbd_1 | slurmdbd: error: mysql
Hello everyone,
today we update our Slurm database daemon from 20.02.2 to 20.02.7 and
everything works except when I try to delete a user it does not work.
sacctmgr -i delete user name=xyz account=xyz
sacctmgr: slurmdbd: No error
Nothing deleted
slurmdbd.log:
slurmdbd_1 | slurmdbd: error: mysq