[slurm-users] srun : Communication connection failure

2022-01-20 Thread Durai Arasan
Hello Slurm users, We are suddenly encountering strange errors while trying to launch interactive jobs on our cpu partitions. Have you encountered this problem before? Kindly let us know. [darasan84@bg-slurmb-login1 ~]$ srun --job-name "admin_test231" --ntasks=1 --nodes=1 --cpus-per-task=1 --part

Re: [slurm-users] srun : Communication connection failure

2022-01-20 Thread Durai Arasan
Hello slurm users, I forgot to mention that an identical interactive job works successfully on the gpu partitions (in the same cluster). So this is really puzzling. Best, Durai Arasan MPI Tuebingen On Thu, Jan 20, 2022 at 3:40 PM Durai Arasan wrote: > Hello Slurm users, > > We are suddenly enc

Re: [slurm-users] [External] Re: srun : Communication connection failure

2022-01-20 Thread Michael Robbert
It looks like it could be some kind of network problem but could be DNS. Can you ping and do DNS resolution for the host involved? What does slurmctld.log say? How about slurmd.log on the node in question? Mike From: slurm-users on behalf of Durai Arasan Date: Thursday, January 20, 2022 at 08

[slurm-users] memory per node default

2022-01-20 Thread Hoot Thompson
How do you change the default memory per node from the current 1MB to something much higher? Thanks in advance. *ubuntu@node*:*/shared*$ sinfo -o "%20N%10c%10m%25f%10G " NODELISTCPUSMEMORYAVAIL_FEATURES GRES hpc-demand-dy-c5n18x361 dynamic,c5n.18xlarge,c5n1(null)

Re: [slurm-users] memory per node default

2022-01-20 Thread Ole Holm Nielsen
On 1/20/22 22:22, Hoot Thompson wrote: How do you change the default memory per node from the current 1MB to something much higher? Thanks in advance. *ubuntu@node*:*/shared*$ sinfo -o "%20N%10c%10m%25f%10G " NODELISTCPUSMEMORYAVAIL_FEATURES GRES hpc-demand-dy-c5n18x361 dynamic,c5n.18xlarge