I might be offbase here, but I think what was implied is that you've built openmpi with --with-pmi without supplying the path that holds pmi2 libs.

First build slurm with pmi, then build openmpi with path to pmi so in slurm.

That might not provide pmix, but it will work with pmi2.

--with-pmi=<dir> or --with-pmi-libdir=<dir>



On 18/11//2017 19:03, Bennet Fauber wrote:
Howard,

Thanks for the reply.

I think, based on a previous reply, that we may not have the right combinations of pmi and slurm lined up.  I will have to coordinate with our admin who compiles and installs slurm, and once we think we have slurm with pmix, I'll try again and post the files/information you suggest.

Thanks for telling me which files and output is most useful here.

-- bennet



On Fri, Nov 17, 2017 at 11:45 PM, Howard Pritchard <hpprit...@gmail.com> wrote:
Hello Bennet,

What you are trying to do using srun as the job launcher should work.  Could you post the contents
of /etc/slurm/slurm.conf for your system?  

Could you also post the output of the following command:

ompi_info --all | grep pmix

to the mail list.

the config.log from your build would also be useful.

Howard


2017-11-16 9:30 GMT-07:00 r...@open-mpi.org <r...@open-mpi.org>:
What Charles said was true but not quite complete. We still support the older PMI libraries but you likely have to point us to wherever slurm put them.

However,we definitely recommend using PMIx as you will get a faster launch

Sent from my iPad

> On Nov 16, 2017, at 9:11 AM, Bennet Fauber <ben...@umich.edu> wrote:
>
> Charlie,
>
> Thanks a ton!  Yes, we are missing two of the three steps.
>
> Will report back after we get pmix installed and after we rebuild
> Slurm.  We do have a new enough version of it, at least, so we might
> have missed the target, but we did at least hit the barn.  ;-)
>
>
>
>> On Thu, Nov 16, 2017 at 10:54 AM, Charles A Taylor <chas...@ufl.edu> wrote:
>> Hi Bennet,
>>
>> Three things...
>>
>> 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
>>
>> 2. You will need slurm 16.05 or greater built with —with-pmix
>>
>> 2a. You will need pmix 1.1.5 which you can get from github.
>> (https://github.com/pmix/tarballs).
>>
>> 3. then, to launch your mpi tasks on the allocated resources,
>>
>>   srun —mpi=pmix ./hello-mpi
>>
>> I’m replying to the list because,
>>
>> a) this information is harder to find than you might think.
>> b) someone/anyone can correct me if I’’m giving a bum steer.
>>
>> Hope this helps,
>>
>> Charlie Taylor
>> University of Florida
>>
>> On Nov 16, 2017, at 10:34 AM, Bennet Fauber <ben...@umich.edu> wrote:
>>
>> I think that OpenMPI is supposed to support SLURM integration such that
>>
>>   srun ./hello-mpi
>>
>> should work?  I built OMPI 2.1.2 with
>>
>> export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
>> export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
>>
>> CMD="./configure \
>>   --prefix=${PREFIX} \
>>   --mandir=${PREFIX}/share/man \
>>   --with-slurm \
>>   --with-pmi \
>>   --with-lustre \
>>   --with-verbs \
>>   $CONFIGURE_FLAGS \
>>   $COMPILERS
>>
>> I have a simple hello-mpi.c (source included below), which compiles
>> and runs with mpirun, both on the login node and in a job.  However,
>> when I try to use srun in place of mpirun, I get instead a hung job,
>> which upon cancellation produces this output.
>>
>> [bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
>> PMI is not initialized
>> [bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
>> PMI is not initialized
>> [warn] opal_libevent2022_event_active: event has no event_base set.
>> [warn] opal_libevent2022_event_active: event has no event_base set.
>> slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
>>
>> The SLURM web page suggests that OMPI 2.x and later support PMIx, and
>> to use `srun --mpi=pimx`, however that no longer seems to be an
>> option, and using the `openmpi` type isn't working (neither is pmi2).
>>
>> [bennet@beta-build hello]$ srun --mpi=list
>> srun: MPI types are...
>> srun: mpi/pmi2
>> srun: mpi/lam
>> srun: mpi/openmpi
>> srun: mpi/mpich1_shmem
>> srun: mpi/none
>> srun: mpi/mvapich
>> srun: mpi/mpich1_p4
>> srun: mpi/mpichgm
>> srun: mpi/mpichmx
>>
>> To get the Intel PMI to work with srun, I have to set
>>
>>   I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
>>
>> Is there a comparable environment variable that must be set to enable
>> `srun` to work?
>>
>> Am I missing a build option or misspecifying one?
>>
>> -- bennet
>>
>>
>> Source of hello-mpi.c
>> ==========================================
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include "mpi.h"
>>
>> int main(int argc, char **argv){
>>
>> int rank;          /* rank of process */
>> int numprocs;      /* size of COMM_WORLD */
>> int namelen;
>> int tag=10;        /* expected tag */
>> int message;       /* Recv'd message */
>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> MPI_Status status; /* status of recv */
>>
>> /* call Init, size, and rank */
>> MPI_Init(&argc, &argv);
>> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Get_processor_name(processor_name, &namelen);
>>
>> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
>>
>> if(rank != 0){
>>   MPI_Recv(&message,    /*buffer for message */
>>                   1,    /*MAX count to recv */
>>             MPI_INT,    /*type to recv */
>>                   0,    /*recv from 0 only */
>>                 tag,    /*tag of messgae */
>>      MPI_COMM_WORLD,    /*communicator to use */
>>             &status);   /*status object */
>>   printf("Hello from process %d!\n",rank);
>> }
>> else{
>>   /* rank 0 ONLY executes this */
>>   printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
>>   int x;
>>   for(x=1; x<numprocs; x++){
>>      MPI_Send(&x,          /*send x to process x */
>>                1,          /*number to send */
>>          MPI_INT,          /*type to send */
>>                x,          /*rank to send to */
>>              tag,          /*tag for message */
>>    MPI_COMM_WORLD);        /*communicator to use */
>>   }
>> } /* end else */
>>
>>
>> /* always call at end */
>> MPI_Finalize();
>>
>> return 0;
>> }
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=t2C9i2WW8vYudLmnfvtKjpqTlBguLeivBwHAaQ1TcM4&s=aakHf5ypdTOe4-hQ86pcEN9FmiW1Xyngln5ODOUwCqQ&e=
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to