I think that OpenMPI is supposed to support SLURM integration such that

    srun ./hello-mpi

should work?  I built OMPI 2.1.2 with

export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'

CMD="./configure \
    --prefix=${PREFIX} \
    --mandir=${PREFIX}/share/man \
    --with-slurm \
    --with-pmi \
    --with-lustre \
    --with-verbs \
    $CONFIGURE_FLAGS \
    $COMPILERS

I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job.  However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.

[bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***

The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).

[bennet@beta-build hello]$ srun --mpi=list
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx

To get the Intel PMI to work with srun, I have to set

    I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

Is there a comparable environment variable that must be set to enable
`srun` to work?

Am I missing a build option or misspecifying one?

-- bennet


Source of hello-mpi.c
==========================================
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main(int argc, char **argv){

  int rank;          /* rank of process */
  int numprocs;      /* size of COMM_WORLD */
  int namelen;
  int tag=10;        /* expected tag */
  int message;       /* Recv'd message */
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  MPI_Status status; /* status of recv */

  /* call Init, size, and rank */
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  if(rank != 0){
    MPI_Recv(&message,    /*buffer for message */
                    1,    /*MAX count to recv */
              MPI_INT,    /*type to recv */
                    0,    /*recv from 0 only */
                  tag,    /*tag of messgae */
       MPI_COMM_WORLD,    /*communicator to use */
              &status);   /*status object */
    printf("Hello from process %d!\n",rank);
  }
  else{
    /* rank 0 ONLY executes this */
    printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
    int x;
    for(x=1; x<numprocs; x++){
       MPI_Send(&x,          /*send x to process x */
                 1,          /*number to send */
           MPI_INT,          /*type to send */
                 x,          /*rank to send to */
               tag,          /*tag for message */
     MPI_COMM_WORLD);        /*communicator to use */
    }
  } /* end else */


/* always call at end */
MPI_Finalize();

return 0;
}
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to