Couple of comments. Your original cmd line: >> srun -n 2 mpirun MPI-hellow
tells srun to launch two copies of mpirun, each of which is to run as many processes as there are slots assigned to the allocation. srun will get an allocation of two slots, and so you’ll get two concurrent MPI jobs, each consisting of two procs. Your other cmd line: >> srun -c 2 mpirun -np 2 MPI-hellow told srun to get two slots but only run one copy (the default value of the -n option) of mpirun, and you told mpirun to launch two procs. So you got one job consisting of two procs. What you probably want to do is what Gilles advised. However, Slurm 16.05 only supports PMIx v1, so you’d want to download and build PMIx v1.2.5, and then build Slurm against it. OMPI v2.0.2 may have a slightly older copy of PMIx in it (I honestly don’t remember) - to be safe, it would be best to configure OMPI to use the 1.2.5 you installed for Slurm. You’ll also be required to build OMPI against an external copy of libevent and hwloc to ensure OMPI is linked against the same versions used by PMIx. Or you can just build OMPI against the Slurm PMI library - up to you. Ralph > On Nov 23, 2018, at 2:31 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Lothar, > > it seems you did not configure Open MPI with --with-pmi=<path to SLURM's PMI> > > If SLURM was built with PMIx support, then an other option is to use that. > First, srun --mpi=list will show you the list of available MPI > modules, and then you could > srun --mpi=pmix_v2 ... MPI_Hellow > If you believe that should be the default, then you should contact > your sysadmin that can make that for you. > > You you want to use PMIx, then I recommend you configure Open MPI with > the same external PMIx that was used to > build SLURM (e.g. configure --with-pmix=<path to PMIx>). Though PMIx > has cross version support, using the same PMIx will avoid you running > incompatible PMIx versions. > > > Cheers, > > Gilles > On Fri, Nov 23, 2018 at 5:20 PM Lothar Brendel > <lothar.bren...@uni-due.de> wrote: >> >> Hi guys, >> >> I've always been somewhat at a loss regarding slurm's idea about tasks vs. >> jobs. That didn't cause any problems, though, until passing to OpenMPI2 >> (2.0.2 that is, with slurm 16.05.9). >> >> Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with >> just >> >> srun -n 2 MPI-hellow >> >> yields >> >> Hello world from processor node31, rank 0 out of 1 processors >> Hello world from processor node31, rank 0 out of 1 processors >> >> i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include >> an mpirun. >> >> But running >> >> srun -n 2 mpirun MPI-hellow >> >> produces >> >> Hello world from processor node31, rank 1 out of 2 processors >> Hello world from processor node31, rank 0 out of 2 processors >> Hello world from processor node31, rank 1 out of 2 processors >> Hello world from processor node31, rank 0 out of 2 processors >> >> i.e. I get *two* independent MPI-tasks with 2 processors each. (The same >> applies if stating explicitly "mpirun -np 2".) >> I never could make sense of this squaring, I rather used to run my jobs like >> >> srun -c 2 mpirun -np 2 MPI-hellow >> >> which provided the desired job with *one* task using 2 processors. Passing >> from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting >> the error >> "There are not enough slots available in the system to satisfy the 2 slots >> that were requested by the application: >> MPI-hellow" now. >> >> The environment on the node contains >> >> SLURM_CPUS_ON_NODE=2 >> SLURM_CPUS_PER_TASK=2 >> SLURM_JOB_CPUS_PER_NODE=2 >> SLURM_NTASKS=1 >> SLURM_TASKS_PER_NODE=1 >> >> which looks fine to me, but mpirun infers slots=1 from that (confirmed by >> ras_base_verbose 5). In deed, looking into >> orte/mca/ras/slurm/ras_slurm_module.c, I find that while >> orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its >> local variable cpus_per_task, it doesn't use it anywhere. Rather, the number >> of slots is determined from SLURM_TASKS_PER_NODE. >> >> Is this intended behaviour? >> >> What's wrong here? I know that I can use --oversubscribe, but that seems >> rather a workaround. >> >> Thanks in advance, >> Lothar >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users