Hi Mike,
What version of Slurm are you using?
If you are running a version of Slurm 20.11.x or newer, a change in the
scheduler behavior was made so that by default srun will not allow
resources to be overlapped by job steps.
https://bugs.schedmd.com/show_bug.cgi?id=11863#c3
I would see if a
I have a user who is submitting a job to slurm which requests 16 tasks, i.e.
#SBATCH --ntasks 16
#SBATCH –cpus-per-task 1
The slurm script runs an mpi program called Parent.mpi, which then (fails to)
call 15 mpi child processes. He’s tried two different ways for the parent to
spawn the childre