I dont' know much about Slurm but if you want to start troubleshooting then
you need to isolate the step where error appears. From the output you have
posted , it looks like you are using some automated script to download,
extract and build Slurm. Look here:
"/bin/sh -c cd /tmp && wget
https://d
Hello,
Are the instructions for building Debian RPMs found at
https://slurm.schedmd.com/quickstart_admin.html#debuild expected to work on ARM
machines?
I am having trouble with the "debuild -b -uc -us” step.
#10 29.01 configure: exit 1
#10 29.01 dh_auto_configure: error: cd obj-aarch64-linux-
We are pleased to announce the availability of Slurm version 23.11.8.
The 23.11.8 release fixes some potential crashes in slurmctld,
slurmrestd, and slurmd when using less common features; two issues in
auth/slurm; and a few other minor bugs.
Slurm can be downloaded from https://www.schedmd.c
Hi, I am building a containerized Slurm cluster with Ubuntu 20.04 and have it
almost working.
The daemons start, and an “sinfo” command shows compute nodes up and available:
admin@slurmfrontend:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
slurmpar*up infinite 3 idle sl
There is a permission problem somewhere, but I don’t know where.
If I run as root, it works:
admin@slurmfrontend:~$ srun hostname
srun: error: task 0 launch failed: Slurmd could not execve job
slurmstepd: error: task_g_set_affinity: Operation not permitted
slurmstepd: error: _exec_wait_child_wait
Gestió Servidors via slurm-users wrote:
> What I want is users could user all of them but simultaniously, a user only
> could use one of the RTX3080.
How about two partitions: One contains only the RTX3080, using the QoS
MaxTRESPerUser=gres/gpu=1 and another one with all the other GPUs not
havin