I think that OpenMPI is supposed to support SLURM integration such that srun ./hello-mpi
should work? I built OMPI 2.1.2 with export CONFIGURE_FLAGS='--disable-dlopen --enable-shared' export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran' CMD="./configure \ --prefix=${PREFIX} \ --mandir=${PREFIX}/share/man \ --with-slurm \ --with-pmi \ --with-lustre \ --with-verbs \ $CONFIGURE_FLAGS \ $COMPILERS I have a simple hello-mpi.c (source included below), which compiles and runs with mpirun, both on the login node and in a job. However, when I try to use srun in place of mpirun, I get instead a hung job, which upon cancellation produces this output. [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]: PMI is not initialized [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]: PMI is not initialized [warn] opal_libevent2022_event_active: event has no event_base set. [warn] opal_libevent2022_event_active: event has no event_base set. slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 *** srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 *** The SLURM web page suggests that OMPI 2.x and later support PMIx, and to use `srun --mpi=pimx`, however that no longer seems to be an option, and using the `openmpi` type isn't working (neither is pmi2). [bennet@beta-build hello]$ srun --mpi=list srun: MPI types are... srun: mpi/pmi2 srun: mpi/lam srun: mpi/openmpi srun: mpi/mpich1_shmem srun: mpi/none srun: mpi/mvapich srun: mpi/mpich1_p4 srun: mpi/mpichgm srun: mpi/mpichmx To get the Intel PMI to work with srun, I have to set I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so Is there a comparable environment variable that must be set to enable `srun` to work? Am I missing a build option or misspecifying one? -- bennet Source of hello-mpi.c ========================================== #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main(int argc, char **argv){ int rank; /* rank of process */ int numprocs; /* size of COMM_WORLD */ int namelen; int tag=10; /* expected tag */ int message; /* Recv'd message */ char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Status status; /* status of recv */ /* call Init, size, and rank */ MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); if(rank != 0){ MPI_Recv(&message, /*buffer for message */ 1, /*MAX count to recv */ MPI_INT, /*type to recv */ 0, /*recv from 0 only */ tag, /*tag of messgae */ MPI_COMM_WORLD, /*communicator to use */ &status); /*status object */ printf("Hello from process %d!\n",rank); } else{ /* rank 0 ONLY executes this */ printf("MPI_COMM_WORLD is %d processes big!\n", numprocs); int x; for(x=1; x<numprocs; x++){ MPI_Send(&x, /*send x to process x */ 1, /*number to send */ MPI_INT, /*type to send */ x, /*rank to send to */ tag, /*tag for message */ MPI_COMM_WORLD); /*communicator to use */ } } /* end else */ /* always call at end */ MPI_Finalize(); return 0; } _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users