[slurm-users] sstat -a: Socket timed out on send/recv operation

2023-07-11 Thread Angel de Vicente
Hello, trying to get some stats about a running job, I've realized that one of the jobs is consistently failing with: , | sstat: error: slurm_receive_msgs: [[]:6818] failed: Socket timed out on send/recv operation | sstat: error: slurm_job_step_stat: unknown return given from .ll.ia

Re: [slurm-users] END Mail notifications not being sent?

2023-07-11 Thread Kilian Cavalotti
And to close the loop on this, the "smail" fix will be in 23.02.4 when it's released https://bugs.schedmd.com/show_bug.cgi?id=17123 Cheers, -- Kilian On Mon, Jul 3, 2023 at 9:30 AM Angel de Vicente wrote: > > Hello, > > Angel de Vicente writes: > > > Any idea what could be going on or how to de

Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

2023-07-11 Thread Williams, Jenny Avis
Additional configuration information -- /etc/slurm/cgroup.conf CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes CgroupPlugin=cgroup/v2 AllowedSwapSpace=1 ConstrainSwapSpace=yes ConstrainDevices=yes From: Williams, Jenny Avis Sent: Tuesday, July 11, 2023 10:47 AM To: slurm-us...@schedm

[slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

2023-07-11 Thread Williams, Jenny Avis
Progress on getting slurmd to start under cgroupv2 Issue: slurmd 22.05.6 will not start when using cgroupv2 Expected result: even after reboot slurmd will start up without needing to manually add lines to /sys/fs/cgroup files. When started as service the error is: # systemctl status slurmd * s