Seems like the problem occurs here ``` [2024-01-22T17:13:16.819] [63786.0] cgroup/v2: cgroup_p_constrain_apply: CGROUP: EBPF Closing and loading bpf program into /sys/fs/cgroup/system.slice/slurmstepd.scope/job_63786/step_0/user [2024-01-22T17:13:16.819] [63786.0] error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK). ```
Either Slurm is doing things in a wrong way (and that wrong way was working until now) or the changes in the kernel between -89 to -91 actually broke something. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2050098 Title: cgroup2 appears to be broken Status in linux package in Ubuntu: New Bug description: We're using Slurm workload manager in a cluster with Ubuntu 22.04 and the linux-generic kernel (amd64). We use cgroups (cgroup2) for resource allocation with Slurm. With kernel version linux-image-5.15.0-91-generic 5.15.0-91.101 amd64 I'm seeing a new issue. This must have been introduces recently, I can confirm that with kernel 5.15.0-88-generic the issue does not exist. When I request a single GPU on a node with kernel 5.15.0-88-generic all is well: $ srun -G 1 -w gpu59 nvidia-smi -L GPU 0: NVIDIA [...] Instead with kernel 5.15.0-91-generic: $ srun -G 1 -w gpu59 nvidia-smi -L slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK). GPU 0: NVIDIA [...] GPU 1: NVIDIA [...] GPU 2: NVIDIA [...] GPU 3: NVIDIA [...] GPU 4: NVIDIA [...] GPU 5: NVIDIA [...] GPU 6: NVIDIA [...] GPU 7: NVIDIA [...] So I get this error regarding MEMLOCK limit and see all GPUs in the system instead of only the one requested. Hence I assume the problem is related to cgroups. $ cat /proc/version_signature Ubuntu 5.15.0-91.101-generic 5.15.131 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2050098/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp