Howard, Thanks for the reply.
I think, based on a previous reply, that we may not have the right combinations of pmi and slurm lined up. I will have to coordinate with our admin who compiles and installs slurm, and once we think we have slurm with pmix, I'll try again and post the files/information you suggest. Thanks for telling me which files and output is most useful here. -- bennet On Fri, Nov 17, 2017 at 11:45 PM, Howard Pritchard <hpprit...@gmail.com> wrote: > Hello Bennet, > > What you are trying to do using srun as the job launcher should work. > Could you post the contents > of /etc/slurm/slurm.conf for your system? > > Could you also post the output of the following command: > > ompi_info --all | grep pmix > > to the mail list. > > the config.log from your build would also be useful. > > Howard > > 2017-11-16 9:30 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>: > >> What Charles said was true but not quite complete. We still support the >> older PMI libraries but you likely have to point us to wherever slurm put >> them. >> >> However,we definitely recommend using PMIx as you will get a faster launch >> >> Sent from my iPad >> >> > On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote: >> > >> > Charlie, >> > >> > Thanks a ton! Yes, we are missing two of the three steps. >> > >> > Will report back after we get pmix installed and after we rebuild >> > Slurm. We do have a new enough version of it, at least, so we might >> > have missed the target, but we did at least hit the barn. ;-) >> > >> > >> > >> >> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu> >> wrote: >> >> Hi Bennet, >> >> >> >> Three things... >> >> >> >> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2. >> >> >> >> 2. You will need slurm 16.05 or greater built with —with-pmix >> >> >> >> 2a. You will need pmix 1.1.5 which you can get from github. >> >> (https://github.com/pmix/tarballs). >> >> >> >> 3. then, to launch your mpi tasks on the allocated resources, >> >> >> >> srun —mpi=pmix ./hello-mpi >> >> >> >> I’m replying to the list because, >> >> >> >> a) this information is harder to find than you might think. >> >> b) someone/anyone can correct me if I’’m giving a bum steer. >> >> >> >> Hope this helps, >> >> >> >> Charlie Taylor >> >> University of Florida >> >> >> >> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote: >> >> >> >> I think that OpenMPI is supposed to support SLURM integration such that >> >> >> >> srun ./hello-mpi >> >> >> >> should work? I built OMPI 2.1.2 with >> >> >> >> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared' >> >> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran' >> >> >> >> CMD="./configure \ >> >> --prefix=${PREFIX} \ >> >> --mandir=${PREFIX}/share/man \ >> >> --with-slurm \ >> >> --with-pmi \ >> >> --with-lustre \ >> >> --with-verbs \ >> >> $CONFIGURE_FLAGS \ >> >> $COMPILERS >> >> >> >> I have a simple hello-mpi.c (source included below), which compiles >> >> and runs with mpirun, both on the login node and in a job. However, >> >> when I try to use srun in place of mpirun, I get instead a hung job, >> >> which upon cancellation produces this output. >> >> >> >> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]: >> >> PMI is not initialized >> >> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]: >> >> PMI is not initialized >> >> [warn] opal_libevent2022_event_active: event has no event_base set. >> >> [warn] opal_libevent2022_event_active: event has no event_base set. >> >> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT >> 2017-11-16T10:03:24 *** >> >> srun: Job step aborted: Waiting up to 32 seconds for job step to >> finish. >> >> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 >> *** >> >> >> >> The SLURM web page suggests that OMPI 2.x and later support PMIx, and >> >> to use `srun --mpi=pimx`, however that no longer seems to be an >> >> option, and using the `openmpi` type isn't working (neither is pmi2). >> >> >> >> [bennet@beta-build hello]$ srun --mpi=list >> >> srun: MPI types are... >> >> srun: mpi/pmi2 >> >> srun: mpi/lam >> >> srun: mpi/openmpi >> >> srun: mpi/mpich1_shmem >> >> srun: mpi/none >> >> srun: mpi/mvapich >> >> srun: mpi/mpich1_p4 >> >> srun: mpi/mpichgm >> >> srun: mpi/mpichmx >> >> >> >> To get the Intel PMI to work with srun, I have to set >> >> >> >> I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so >> >> >> >> Is there a comparable environment variable that must be set to enable >> >> `srun` to work? >> >> >> >> Am I missing a build option or misspecifying one? >> >> >> >> -- bennet >> >> >> >> >> >> Source of hello-mpi.c >> >> ========================================== >> >> #include <stdio.h> >> >> #include <stdlib.h> >> >> #include "mpi.h" >> >> >> >> int main(int argc, char **argv){ >> >> >> >> int rank; /* rank of process */ >> >> int numprocs; /* size of COMM_WORLD */ >> >> int namelen; >> >> int tag=10; /* expected tag */ >> >> int message; /* Recv'd message */ >> >> char processor_name[MPI_MAX_PROCESSOR_NAME]; >> >> MPI_Status status; /* status of recv */ >> >> >> >> /* call Init, size, and rank */ >> >> MPI_Init(&argc, &argv); >> >> MPI_Comm_size(MPI_COMM_WORLD, &numprocs); >> >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> MPI_Get_processor_name(processor_name, &namelen); >> >> >> >> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); >> >> >> >> if(rank != 0){ >> >> MPI_Recv(&message, /*buffer for message */ >> >> 1, /*MAX count to recv */ >> >> MPI_INT, /*type to recv */ >> >> 0, /*recv from 0 only */ >> >> tag, /*tag of messgae */ >> >> MPI_COMM_WORLD, /*communicator to use */ >> >> &status); /*status object */ >> >> printf("Hello from process %d!\n",rank); >> >> } >> >> else{ >> >> /* rank 0 ONLY executes this */ >> >> printf("MPI_COMM_WORLD is %d processes big!\n", numprocs); >> >> int x; >> >> for(x=1; x<numprocs; x++){ >> >> MPI_Send(&x, /*send x to process x */ >> >> 1, /*number to send */ >> >> MPI_INT, /*type to send */ >> >> x, /*rank to send to */ >> >> tag, /*tag for message */ >> >> MPI_COMM_WORLD); /*communicator to use */ >> >> } >> >> } /* end else */ >> >> >> >> >> >> /* always call at end */ >> >> MPI_Finalize(); >> >> >> >> return 0; >> >> } >> >> _______________________________________________ >> >> users mailing list >> >> users@lists.open-mpi.org >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.o >> pen-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=pZJPUDQ3SB9J >> plYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=t >> 2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4-h >> Q86pcEN9FmiW1Xyngln5ODOUwCqQ&e= >> >> >> >> >> >> >> >> _______________________________________________ >> >> users mailing list >> >> users@lists.open-mpi.org >> >> https://lists.open-mpi.org/mailman/listinfo/users >> > _______________________________________________ >> > users mailing list >> > users@lists.open-mpi.org >> > https://lists.open-mpi.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users