There needs to be a slurmstepd infinity process running before slurmd starts.
This doc goes into it:
https://slurm.schedmd.com/cgroup_v2.html

Probably a better way to do this, but this is what we do to deal with that:

::::::::::::::
files/slurm-cgrepair.service
::::::::::::::
[Unit]
Before=slurmd.service slurmctld.service
After=nas-longleaf.mount remote-fs.target system.slice

[Service]
Type=oneshot
ExecStart=/callback/slurm-cgrepair.sh

[Install]
WantedBy=default.target
::::::::::::::
files/slurm-cgrepair.sh
::::::::::::::
#!/bin/bash
/usr/bin/echo +cpu +cpuset +memory >> /sys/fs/cgroup/cgroup.subtree_control && \
/usr/bin/echo +cpu +cpuset +memory >> 
/sys/fs/cgroup/system.slice/cgroup.subtree_control

/usr/sbin/slurmstepd infinity &




From: Josef Dvoracek via slurm-users <slurm-users@lists.schedmd.com>
Sent: Thursday, April 11, 2024 11:14 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: Slurmd enabled crash with CgroupV2


I observe same behavior on slurm 23.11.5 Rocky Linux8.9..

> [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control
> memory pids
> [root@compute ~]# systemctl disable slurmd
> Removed /etc/systemd/system/multi-user.target.wants/slurmd.service.
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control
> cpuset cpu io memory pids
> [root@compute ~]# systemctl enable slurmd
> Created symlink /etc/systemd/system/multi-user.target.wants/slurmd.service → 
> /usr/lib/systemd/system/slurmd.service.
> [root@compute ~]# cat /sys/fs/cgroup/cgroup.subtree_control
> cpuset cpu io memory pids

over time (i see this thread is ~1 year old, is here better / new understanding 
of this?

cheers

josef


On 23. 05. 23 12:46, Alan Orth wrote:
I notice the exact same behavior as Tristan. My CentOS Stream 8 system is in 
full unified cgroupv2 mode, the slurmd.service has a "Delegate=Yes" override 
added to it, and all cgroup stuff is added to slurm.conf and cgroup.conf, yet 
slurmd does not start after reboot. I don't understand what is happening, but I 
see the exact same behavior regarding the cgroup subtree_control with disabling 
/ re-enabling slurmd.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to