Hello Slurm users,

We are suddenly encountering strange errors while trying to launch
interactive jobs on our cpu partitions. Have you encountered this problem
before? Kindly let us know.

[darasan84@bg-slurmb-login1 ~]$ srun --job-name "admin_test231" --ntasks=1
--nodes=1 --cpus-per-task=1 --partition=cpu-short --mem=1G
 --nodelist=slurm-cpu-hm-7 --time 1:00:00 --pty bash
srun: error: Task launch for StepId=1137134.0 failed on node
slurm-cpu-hm-7: Communication connection failure
srun: error: Application launch failed: Communication connection failure
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete

Best regards,
Durai Arasan
MPI Tuebingen

Reply via email to