[slurm-users] Re: Print Slurm Stats on Login

2024-08-21 Thread Patryk Bełzak via slurm-users
Hi, what Ole wrote is exactly what crossed my mind. I had an episode with stats at login too, I put reportseff to motd script and it was a bad idea. It turned out that if for any reason slurm controler took longer time to respond, it delayed user login which annoyed them more than they apprecia

[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-24 Thread Patryk Bełzak via slurm-users
termac.org<mailto:jason.el...@petermac.org> > 305 Grattan Street > Melbourne, Victoria > 3000 Australia > > www.petermac.org<http://www.petermac.org> > > [/var/folders/5b/sblmh0652x10d01v52f6htzrng5ffk/T/com.microsoft.Outlook/WebArchiveCopyPasteTempFiles/cidec351626

[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-22 Thread Patryk Bełzak via slurm-users
Hi, we've been facing the same issue for some time. At the beginning the missing socket error happened every 20 minutes, later once per hour, now it happens few times a day. The only downside of this was that controller was unresponsive for that couple of seconds - up to 60, if I remember well.

[slurm-users] Re: Problems with gres.conf

2024-06-04 Thread Patryk Bełzak via slurm-users
Hi, I believe that setting cores in gres.conf explicitly gives you better control over hardware configuration, I wouldn't trust slurm on that one. We have the gres.conf along with "Cores", all you have to do is proper Numa discovery (as long as your hardware has numa), and then assign correct co

[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
make sense? > > I also missed that setting in slurm.conf so good to know it is possible to > change the default behaviour. > > Tom > > From: Patryk Bełzak via slurm-users > Date: Friday, 17 May 2024 at 10:15 > To: Dj Merrill > Cc: slurm-users@lists.schedmd.co

[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
Hi, I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf: ``` * softmemlock 4086160 #Allow more Memory Locks for MPI * hardmemlock