Hi,
what Ole wrote is exactly what crossed my mind. I had an episode with stats at
login too, I put reportseff to motd script and it was a bad idea. It turned out
that if for any reason slurm controler took longer time to respond, it delayed
user login which annoyed them more than they apprecia
termac.org<mailto:jason.el...@petermac.org>
> 305 Grattan Street
> Melbourne, Victoria
> 3000 Australia
>
> www.petermac.org<http://www.petermac.org>
>
> [/var/folders/5b/sblmh0652x10d01v52f6htzrng5ffk/T/com.microsoft.Outlook/WebArchiveCopyPasteTempFiles/cidec351626
Hi,
we've been facing the same issue for some time. At the beginning the missing
socket error happened every 20 minutes, later once per hour, now it happens few
times a day.
The only downside of this was that controller was unresponsive for that couple
of seconds - up to 60, if I remember well.
Hi,
I believe that setting cores in gres.conf explicitly gives you better control
over hardware configuration, I wouldn't trust slurm on that one.
We have the gres.conf along with "Cores", all you have to do is proper Numa
discovery (as long as your hardware has numa), and then assign correct co
make sense?
>
> I also missed that setting in slurm.conf so good to know it is possible to
> change the default behaviour.
>
> Tom
>
> From: Patryk Bełzak via slurm-users
> Date: Friday, 17 May 2024 at 10:15
> To: Dj Merrill
> Cc: slurm-users@lists.schedmd.co
Hi,
I wonder where does this problems come from, perhaps I am missing something,
but we never had such issues with limits since we have it set on worker nodes
in /etc/security/limits.d/99-cluster.conf:
```
* softmemlock 4086160 #Allow more Memory Locks for MPI
* hardmemlock