Howard,

Thanks for the reply.

I think, based on a previous reply, that we may not have the right
combinations of pmi and slurm lined up.  I will have to coordinate with our
admin who compiles and installs slurm, and once we think we have slurm with
pmix, I'll try again and post the files/information you suggest.

Thanks for telling me which files and output is most useful here.

-- bennet



On Fri, Nov 17, 2017 at 11:45 PM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Hello Bennet,
>
> What you are trying to do using srun as the job launcher should work.
> Could you post the contents
> of /etc/slurm/slurm.conf for your system?
>
> Could you also post the output of the following command:
>
> ompi_info --all | grep pmix
>
> to the mail list.
>
> the config.log from your build would also be useful.
>
> Howard
>
> 2017-11-16 9:30 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>:
>
>> What Charles said was true but not quite complete. We still support the
>> older PMI libraries but you likely have to point us to wherever slurm put
>> them.
>>
>> However,we definitely recommend using PMIx as you will get a faster launch
>>
>> Sent from my iPad
>>
>> > On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote:
>> >
>> > Charlie,
>> >
>> > Thanks a ton!  Yes, we are missing two of the three steps.
>> >
>> > Will report back after we get pmix installed and after we rebuild
>> > Slurm.  We do have a new enough version of it, at least, so we might
>> > have missed the target, but we did at least hit the barn.  ;-)
>> >
>> >
>> >
>> >> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu>
>> wrote:
>> >> Hi Bennet,
>> >>
>> >> Three things...
>> >>
>> >> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
>> >>
>> >> 2. You will need slurm 16.05 or greater built with —with-pmix
>> >>
>> >> 2a. You will need pmix 1.1.5 which you can get from github.
>> >> (https://github.com/pmix/tarballs).
>> >>
>> >> 3. then, to launch your mpi tasks on the allocated resources,
>> >>
>> >>   srun —mpi=pmix ./hello-mpi
>> >>
>> >> I’m replying to the list because,
>> >>
>> >> a) this information is harder to find than you might think.
>> >> b) someone/anyone can correct me if I’’m giving a bum steer.
>> >>
>> >> Hope this helps,
>> >>
>> >> Charlie Taylor
>> >> University of Florida
>> >>
>> >> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote:
>> >>
>> >> I think that OpenMPI is supposed to support SLURM integration such that
>> >>
>> >>   srun ./hello-mpi
>> >>
>> >> should work?  I built OMPI 2.1.2 with
>> >>
>> >> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
>> >> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
>> >>
>> >> CMD="./configure \
>> >>   --prefix=${PREFIX} \
>> >>   --mandir=${PREFIX}/share/man \
>> >>   --with-slurm \
>> >>   --with-pmi \
>> >>   --with-lustre \
>> >>   --with-verbs \
>> >>   $CONFIGURE_FLAGS \
>> >>   $COMPILERS
>> >>
>> >> I have a simple hello-mpi.c (source included below), which compiles
>> >> and runs with mpirun, both on the login node and in a job.  However,
>> >> when I try to use srun in place of mpirun, I get instead a hung job,
>> >> which upon cancellation produces this output.
>> >>
>> >> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
>> >> PMI is not initialized
>> >> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
>> >> PMI is not initialized
>> >> [warn] opal_libevent2022_event_active: event has no event_base set.
>> >> [warn] opal_libevent2022_event_active: event has no event_base set.
>> >> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
>> 2017-11-16T10:03:24 ***
>> >> srun: Job step aborted: Waiting up to 32 seconds for job step to
>> finish.
>> >> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24
>> ***
>> >>
>> >> The SLURM web page suggests that OMPI 2.x and later support PMIx, and
>> >> to use `srun --mpi=pimx`, however that no longer seems to be an
>> >> option, and using the `openmpi` type isn't working (neither is pmi2).
>> >>
>> >> [bennet@beta-build hello]$ srun --mpi=list
>> >> srun: MPI types are...
>> >> srun: mpi/pmi2
>> >> srun: mpi/lam
>> >> srun: mpi/openmpi
>> >> srun: mpi/mpich1_shmem
>> >> srun: mpi/none
>> >> srun: mpi/mvapich
>> >> srun: mpi/mpich1_p4
>> >> srun: mpi/mpichgm
>> >> srun: mpi/mpichmx
>> >>
>> >> To get the Intel PMI to work with srun, I have to set
>> >>
>> >>   I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
>> >>
>> >> Is there a comparable environment variable that must be set to enable
>> >> `srun` to work?
>> >>
>> >> Am I missing a build option or misspecifying one?
>> >>
>> >> -- bennet
>> >>
>> >>
>> >> Source of hello-mpi.c
>> >> ==========================================
>> >> #include <stdio.h>
>> >> #include <stdlib.h>
>> >> #include "mpi.h"
>> >>
>> >> int main(int argc, char **argv){
>> >>
>> >> int rank;          /* rank of process */
>> >> int numprocs;      /* size of COMM_WORLD */
>> >> int namelen;
>> >> int tag=10;        /* expected tag */
>> >> int message;       /* Recv'd message */
>> >> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> >> MPI_Status status; /* status of recv */
>> >>
>> >> /* call Init, size, and rank */
>> >> MPI_Init(&argc, &argv);
>> >> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>> >> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> >> MPI_Get_processor_name(processor_name, &namelen);
>> >>
>> >> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
>> >>
>> >> if(rank != 0){
>> >>   MPI_Recv(&message,    /*buffer for message */
>> >>                   1,    /*MAX count to recv */
>> >>             MPI_INT,    /*type to recv */
>> >>                   0,    /*recv from 0 only */
>> >>                 tag,    /*tag of messgae */
>> >>      MPI_COMM_WORLD,    /*communicator to use */
>> >>             &status);   /*status object */
>> >>   printf("Hello from process %d!\n",rank);
>> >> }
>> >> else{
>> >>   /* rank 0 ONLY executes this */
>> >>   printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
>> >>   int x;
>> >>   for(x=1; x<numprocs; x++){
>> >>      MPI_Send(&x,          /*send x to process x */
>> >>                1,          /*number to send */
>> >>          MPI_INT,          /*type to send */
>> >>                x,          /*rank to send to */
>> >>              tag,          /*tag for message */
>> >>    MPI_COMM_WORLD);        /*communicator to use */
>> >>   }
>> >> } /* end else */
>> >>
>> >>
>> >> /* always call at end */
>> >> MPI_Finalize();
>> >>
>> >> return 0;
>> >> }
>> >> _______________________________________________
>> >> users mailing list
>> >> users@lists.open-mpi.org
>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.o
>> pen-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=pZJPUDQ3SB9J
>> plYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=t
>> 2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4-h
>> Q86pcEN9FmiW1Xyngln5ODOUwCqQ&e=
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users@lists.open-mpi.org
>> >> https://lists.open-mpi.org/mailman/listinfo/users
>> > _______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to