Hi guys, I've always been somewhat at a loss regarding slurm's idea about tasks vs. jobs. That didn't cause any problems, though, until passing to OpenMPI2 (2.0.2 that is, with slurm 16.05.9).
Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with just srun -n 2 MPI-hellow yields Hello world from processor node31, rank 0 out of 1 processors Hello world from processor node31, rank 0 out of 1 processors i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include an mpirun. But running srun -n 2 mpirun MPI-hellow produces Hello world from processor node31, rank 1 out of 2 processors Hello world from processor node31, rank 0 out of 2 processors Hello world from processor node31, rank 1 out of 2 processors Hello world from processor node31, rank 0 out of 2 processors i.e. I get *two* independent MPI-tasks with 2 processors each. (The same applies if stating explicitly "mpirun -np 2".) I never could make sense of this squaring, I rather used to run my jobs like srun -c 2 mpirun -np 2 MPI-hellow which provided the desired job with *one* task using 2 processors. Passing from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting the error "There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: MPI-hellow" now. The environment on the node contains SLURM_CPUS_ON_NODE=2 SLURM_CPUS_PER_TASK=2 SLURM_JOB_CPUS_PER_NODE=2 SLURM_NTASKS=1 SLURM_TASKS_PER_NODE=1 which looks fine to me, but mpirun infers slots=1 from that (confirmed by ras_base_verbose 5). In deed, looking into orte/mca/ras/slurm/ras_slurm_module.c, I find that while orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its local variable cpus_per_task, it doesn't use it anywhere. Rather, the number of slots is determined from SLURM_TASKS_PER_NODE. Is this intended behaviour? What's wrong here? I know that I can use --oversubscribe, but that seems rather a workaround. Thanks in advance, Lothar _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users