When I see odd behaviour I've found it sometimes related to either NTP issues (the time is off) or munge errors:
* Is NTP running and is the time accurate * Look for munge errors: * /var/log/munge/munged.log * sudo systemctl status munge If it's a munge error, usually restarting munge does the trick: sudo systemctl restart munge Regards --Mick ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Alan Orth <alan.o...@gmail.com> Sent: Tuesday, August 16, 2022 4:36 PM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Problems with cgroupsv2 I re-installed SLURM 22.05.3 and then restarted slurmd and now it's working: # dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm # systemctl restart slurmd The dnf.log shows that the versions were the same, so there was no mismatch or anything: 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-devel-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-pam_slurm-22.05.3-1.el8.x86_64 2022-08-16T23:29:02+0300 DEBUG Reinstalled: slurm-slurmd-22.05.3-1.el8.x86_64 So I'm not sure what's going on... anyways, at least it's working now! Regards, On Tue, Aug 16, 2022 at 12:53 PM Alan Orth <alan.o...@gmail.com<mailto:alan.o...@gmail.com>> wrote: Dear list, I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8 successfully for a few months now. Recently a few of my nodes have started having problems starting slurmd. The log shows: [2022-08-16T20:52:58.439] slurmd version 22.05.3 started [2022-08-16T20:52:58.439] error: Controller cpuset is not enabled! [2022-08-16T20:52:58.439] error: Controller cpu is not enabled! [2022-08-16T20:52:58.439] error: cpu cgroup controller is not available. [2022-08-16T20:52:58.439] error: There's an issue initializing memory or cpu controller [2022-08-16T20:52:58.439] error: Couldn't load specified plugin name for jobacct_gather/cgroup: Plugin init() callback failed [2022-08-16T20:52:58.439] error: cannot create jobacct_gather context for jobacct_gather/cgroup [2022-08-16T20:52:58.439] fatal: Unable to initialize jobacct_gather The system has cgroupsv2 enabled as far as I can tell: # cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory hugetlb pids rdma # [ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || ( [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy") unified And my slurm.conf has: ProctrackType=proctrack/cgroup TaskPlugin=task/affinity,task/cgroup And cgroup.conf: CgroupAutomount=yes CgroupPlugin=autodetect What else should I look for before giving up and reverting to cgroupsv1? My current version is 22.05.3, but it was happening in 22.05.2 as well. Thank you for any advice. -- Alan Orth alan.o...@gmail.com<mailto:alan.o...@gmail.com> https://picturingjordan.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=> https://englishbulgaria.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=> https://mjanja.ch<https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=> -- Alan Orth alan.o...@gmail.com<mailto:alan.o...@gmail.com> https://picturingjordan.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__picturingjordan.com&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=Crq2NCkLF76f5LeQhObq0JdnDo_EKcfYlXcq0iyqQvQ&e=> https://englishbulgaria.net<https://urldefense.proofpoint.com/v2/url?u=https-3A__englishbulgaria.net&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=K9dvD9QmS3EWZctC_BnTaz7zdTgF_t3qdDwOtYyCHL8&e=> https://mjanja.ch<https://urldefense.proofpoint.com/v2/url?u=https-3A__mjanja.ch&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=N42Yb1QseMPG8NAPSqhZ5rm7pVFWwTJFjk5YMlMzfRSkD81fZ84pjsBff4qnxNE1&s=D9vI36K8ewQZH9ZIUAAnhRMAJJNdjfbCE9WI-5KuJuU&e=>