HI, Gilles, With respect to your comment about not using --FOO=/usr.... It is bad practice, sure, and it should be unnecessary, but we have had at least one instance where it is also necessary for the requested feature to actually work. The case I am thinking of was, in particular, OpenMPI 1.10.2, where OMPI did not properly bind processes to cores unless we built --with-hwloc=/usr.
When it wasn't used, it mostly ran fine, but for one program, it would occasionally result in 'hopping processes' and very poor performance. After rebuilding --with-hwloc=/usr, that is no longer a problem. It was difficult to pin down, because it did not seem to be an issue all the time, but maybe one in five dragging performance. So, while it may be the case that it should not be necessary and not be good practice, it may also be the case that sometimes it does seem to be necessary. -- bennet On Sun, Feb 24, 2019 at 5:21 AM Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > > Passant, > > The fix is included in PMIx 2.2.2 > > The bug is in a public header file, so you might indeed have to > rebuild the SLURM plugin for PMIx. > I did not check the SLURM sources though, so assuming PMIx was built > as a shared library, there is still a chance > it might work even if you do not rebuild the SLURM plugin. I'd rebuild > at least the SLURM plugin for PMIx to be on the safe side though. > > Cheers, > > Gilles > > On Sun, Feb 24, 2019 at 4:07 PM Passant A. Hafez > <passant.ha...@kaust.edu.sa> wrote: > > > > Thanks Gilles. > > > > So do we have to rebuild Slurm after applying this patch? > > > > Another question, is this fix included in the PMIx 2.2.2 > > https://github.com/pmix/pmix/releases/tag/v2.2.2 ? > > > > > > > > > > All the best, > > > > > > ________________________________________ > > From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles > > Gouaillardet <gilles.gouaillar...@gmail.com> > > Sent: Sunday, February 24, 2019 4:09 AM > > To: Open MPI Users > > Subject: Re: [OMPI users] Building PMIx and Slurm support > > > > Passant, > > > > you have to manually download and apply > > https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81efe318876e659.patch > > to PMIx 2.2.1 > > that should likely fix your problem. > > > > As a side note, it is a bad practice to configure --with-FOO=/usr > > since it might have some unexpected side effects. > > Instead, you can replace > > > > configure --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr > > > > with > > > > configure --with-slurm --with-pmix=external --with-pmi > > --with-libevent=external > > > > to be on the safe side I also invite you to pass --with-hwloc=external > > to the configure command line > > > > > > Cheers, > > > > Gilles > > > > On Sun, Feb 24, 2019 at 1:54 AM Passant A. Hafez > > <passant.ha...@kaust.edu.sa> wrote: > > > > > > Hello Gilles, > > > > > > Here are some details: > > > > > > Slurm 18.08.4 > > > > > > PMIx 2.2.1 (as shown in /usr/include/pmix_version.h) > > > > > > Libevent 2.0.21 > > > > > > srun --mpi=list > > > srun: MPI types are... > > > srun: none > > > srun: openmpi > > > srun: pmi2 > > > srun: pmix > > > srun: pmix_v2 > > > > > > Open MPI versions tested: 4.0.0 and 3.1.2 > > > > > > > > > For each installation to be mentioned a different MPI Hello World program > > > was compiled. > > > Jobs were submitted by sbatch, 2 node * 2 tasks per node then srun > > > --mpi=pmix program > > > > > > File 400ext_2x2.out (attached) is for OMPI 4.0.0 installation with > > > configure options: > > > --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr > > > and configure log: > > > Libevent support: external > > > PMIx support: External (2x) > > > > > > File 400int_2x2.out (attached) is for OMPI 4.0.0 installation with > > > configure options: > > > --with-slurm --with-pmix > > > and configure log: > > > Libevent support: internal (external libevent version is less that > > > internal version 2.0.22) > > > PMIx support: Internal > > > > > > Tested also different installations for 3.1.2 and got errors similar to > > > 400ext_2x2.out > > > (NOT-SUPPORTED in file event/pmix_event_registration.c at line 101) > > > > > > > > > > > > > > > > > > All the best, > > > -- > > > Passant A. Hafez | HPC Applications Specialist > > > KAUST Supercomputing Core Laboratory (KSL) > > > King Abdullah University of Science and Technology > > > Building 1, Al-Khawarizmi, Room 0123 > > > Mobile : +966 (0) 55-247-9568 > > > Mobile : +20 (0) 106-146-9644 > > > Office : +966 (0) 12-808-0367 > > > > > > ________________________________________ > > > From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles > > > Gouaillardet <gilles.gouaillar...@gmail.com> > > > Sent: Saturday, February 23, 2019 5:17 PM > > > To: Open MPI Users > > > Subject: Re: [OMPI users] Building PMIx and Slurm support > > > > > > Hi, > > > > > > PMIx has cross-version compatibility, so as long as the PMIx library > > > used by SLURM is compatible with the one (internal or external) used > > > by Open MPI, you should be fine. > > > If you want to minimize the risk of cross-version incompatibility, > > > then I encourage you to use the same (and hence external) PMIx that > > > was used to build SLURM with Open MPI. > > > > > > Can you tell a bit more than "it didn't work" ? > > > (Open MPI version, PMIx version used by SLURM, PMIx version used by > > > Open MPI, error message, ...) > > > > > > Cheers, > > > > > > Gilles > > > > > > On Sat, Feb 23, 2019 at 9:46 PM Passant A. Hafez > > > <passant.ha...@kaust.edu.sa> wrote: > > > > > > > > > > > > Good day everyone, > > > > > > > > I've trying to build and use the PMIx support for Open MPI but I tried > > > > many things that I can list if needed, but with no luck. > > > > I was able to test the PMIx client but when I used OMPI specifying srun > > > > --mpi=pmix it didn't work. > > > > > > > > So if you please advise me with the versions of each PMIx and Open MPI > > > > that should be working well with Slurm 18.08, it'd be great. > > > > > > > > Also, what is the difference between using internal vs external PMIx > > > > installations? > > > > > > > > > > > > > > > > All the best, > > > > > > > > -- > > > > > > > > Passant A. Hafez | HPC Applications Specialist > > > > KAUST Supercomputing Core Laboratory (KSL) > > > > King Abdullah University of Science and Technology > > > > Building 1, Al-Khawarizmi, Room 0123 > > > > Mobile : +966 (0) 55-247-9568 > > > > Mobile : +20 (0) 106-146-9644 > > > > Office : +966 (0) 12-808-0367 > > > > _______________________________________________ > > > > users mailing list > > > > users@lists.open-mpi.org > > > > https://lists.open-mpi.org/mailman/listinfo/users > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/users > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users