[slurm-users] How to reinstall / reconfigure Slurm?

2024-04-03 Thread Shooktija S N via slurm-users
Hi, I am setting up Slurm on our lab's 3 node cluster and I have run into a problem while adding GPUs (each node has an NVIDIA 4070 ti) as a GRES. There is an error at the 'debug' log level in slurmd.log that says that the GPU is file-less and is being removed from the final GRES list. This error

[slurm-users] Re: How to reinstall / reconfigure Slurm?

2024-04-03 Thread Williams, Jenny Avis via slurm-users
Slurm source code should be downloaded and recompiled including the configuration flag - with-nvml. As an example, using rpmbuild mechanism for recompiling and generating rpms, this is our current method. Be aware that the compile works only if it finds the prerequisites needed for a given op

[slurm-users] Re: Slurm 23.11 - Unknown system variable 'wsrep_on'

2024-04-03 Thread Timo Rothenpieler via slurm-users
On 02.04.2024 22:15, Russell Jones via slurm-users wrote: Hi all, I am working on upgrading a Slurm cluster from 20 -> 23. I was successfully able to upgrade to 22, however now that I am trying to go from 22 to 23, starting slurmdbd results in the following error being logged: error: mysql_

[slurm-users] scrun: Failed to run the container due to GID mapping configuration

2024-04-03 Thread Toshiki Sonoda (Fujitsu) via slurm-users
Dear All, We set up scrun (slurm 23.11.5) integrated with rootless podman, referring to the official documentation. https://slurm.schedmd.com/containers.html#podman-scrun https://slurm.schedmd.com/scrun.html#SECTION_Example-%3CB%3Escrun.lua%3C/B%3E-scripts However, runc/crun prints the error mes