Hello,
trying to get some stats about a running job, I've realized that one of
the jobs is consistently failing with:
,
| sstat: error: slurm_receive_msgs: [[]:6818] failed: Socket timed out on
send/recv operation
| sstat: error: slurm_job_step_stat: unknown return given from .ll.ia
And to close the loop on this, the "smail" fix will be in 23.02.4 when
it's released
https://bugs.schedmd.com/show_bug.cgi?id=17123
Cheers,
--
Kilian
On Mon, Jul 3, 2023 at 9:30 AM Angel de Vicente wrote:
>
> Hello,
>
> Angel de Vicente writes:
>
> > Any idea what could be going on or how to de
Additional configuration information -- /etc/slurm/cgroup.conf
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
CgroupPlugin=cgroup/v2
AllowedSwapSpace=1
ConstrainSwapSpace=yes
ConstrainDevices=yes
From: Williams, Jenny Avis
Sent: Tuesday, July 11, 2023 10:47 AM
To: slurm-us...@schedm
Progress on getting slurmd to start under cgroupv2
Issue: slurmd 22.05.6 will not start when using cgroupv2
Expected result: even after reboot slurmd will start up without needing to
manually add lines to /sys/fs/cgroup files.
When started as service the error is:
# systemctl status slurmd
* s