Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Gilles Gouaillardet via users
Matthias, do you run the MPI application with mpirun or srun? The error log suggests you are using srun, and SLURM only provides only PMI support. If this is the case, then you have three options: - use mpirun - rebuild Open MPI with PMI support as Ralph previously explained - use SLURM PMIx:

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Ralph Castain via users
You should probably ask them - I see in the top one that they used a platform file, which likely had the missing option in it. The bottom one does not use that platform file, so it was probably missed. > On Jan 24, 2022, at 7:17 AM, Matthias Leopold via users > wrote: > > To be sure: both pa

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Matthias Leopold via users
To be sure: both packages were provided by NVIDIA (I didn't compile them) Am 24.01.22 um 16:13 schrieb Matthias Leopold: Thx, but I don't see this option in any of the two versions: /usr/mpi/gcc/openmpi-4.1.2a1/bin/ompi_info (works with slurm):   Configure command line: '--build=x86_64-linux-g

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Matthias Leopold via users
Thx, but I don't see this option in any of the two versions: /usr/mpi/gcc/openmpi-4.1.2a1/bin/ompi_info (works with slurm): Configure command line: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sy

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Ralph Castain via users
If you look at your configure line, you forgot to include --with-pmi=. We don't build the Slurm PMI support by default due to the GPL licensing issues - you have to point at it. > On Jan 24, 2022, at 6:41 AM, Matthias Leopold via users > wrote: > > Hi, > > we have 2 DGX A100 machines and I'

[OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Matthias Leopold via users
Hi, we have 2 DGX A100 machines and I'm trying to run nccl-tests (https://github.com/NVIDIA/nccl-tests) in various ways to understand how things work. I can successfully run nccl-tests on both nodes with Slurm (via srun) when built directly on a compute node against Open MPI 4.1.2 coming fro