[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-22 Thread Jason Ellul via slurm-users
Hi Patryk, Thanks so much for your email. There are a couple of things you list that we have not tried yet so we will definitely look at them. You mention optimizing SSSD which has me curious, are you using RedHat Identity management (free IPA?) because we are and after going through our logs

[slurm-users] Re: Cgroup

2024-07-22 Thread Ole Holm Nielsen via slurm-users
On 7/22/24 12:05, stth via slurm-users wrote: I am configuring cgroups on my server for the first time. I've created a|cgroup.conf|file in the Slurm directory with the following values: |ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes AllowedRAMSpace=90 AllowedSwapSpace=10 |

[slurm-users] Cgroup

2024-07-22 Thread stth via slurm-users
Hello, I am configuring cgroups on my server for the first time. I've created a cgroup.conf file in the Slurm directory with the following values: ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes AllowedRAMSpace=90 AllowedSwapSpace=10 I feel like this configuration might be incomp

[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-22 Thread Patryk Bełzak via slurm-users
Hi, we've been facing the same issue for some time. At the beginning the missing socket error happened every 20 minutes, later once per hour, now it happens few times a day. The only downside of this was that controller was unresponsive for that couple of seconds - up to 60, if I remember well.