My site recently updated to Slurm 21.08.6 and for the most part everything went fine. Two Ubuntu nodes however are having issues. Slurmd cannot execve the jobs on the nodes. As an example:
[jrlang@tmgt1 ~]$ salloc -A ARCC --nodes=1 --ntasks=20 -t 1:00:00 --bell --nodelist=mdgx01 --partition=dgx /bin/bash salloc: Granted job allocation 2328489 [jrlang@tmgt1 ~]$ srun hostname srun: error: task 0 launch failed: Slurmd could not execve job srun: error: task 1 launch failed: Slurmd could not execve job srun: error: task 2 launch failed: Slurmd could not execve job srun: error: task 3 launch failed: Slurmd could not execve job srun: error: task 4 launch failed: Slurmd could not execve job srun: error: task 5 launch failed: Slurmd could not execve job srun: error: task 6 launch failed: Slurmd could not execve job srun: error: task 7 launch failed: Slurmd could not execve job srun: error: task 8 launch failed: Slurmd could not execve job srun: error: task 9 launch failed: Slurmd could not execve job srun: error: task 10 launch failed: Slurmd could not execve job srun: error: task 11 launch failed: Slurmd could not execve job srun: error: task 12 launch failed: Slurmd could not execve job srun: error: task 13 launch failed: Slurmd could not execve job srun: error: task 14 launch failed: Slurmd could not execve job srun: error: task 15 launch failed: Slurmd could not execve job srun: error: task 16 launch failed: Slurmd could not execve job srun: error: task 17 launch failed: Slurmd could not execve job srun: error: task 18 launch failed: Slurmd could not execve job srun: error: task 19 launch failed: Slurmd could not execve job Looking in slurmd-mdgx01.log we only see [2022-03-24T14:44:02.408] [2328501.interactive] error: Failed to invoke task plugins: one of task_p_pre_setuid functions returned error [2022-03-24T14:44:02.409] [2328501.interactive] error: job_manager: exiting abnormally: Slurmd could not execve job [2022-03-24T14:44:02.411] [2328501.interactive] done with job Note that this issues didn't occure with Slurm 20.11.8. Any ideas what could be causing the issue, cause I'm stumped? Jeff