Is Slurm built with PMIx support? Did you tell srun to use it?
> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users > <users@lists.open-mpi.org> wrote: > > I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a > very simple hello, world MPI program that I've used reliably for years. When > I submit the job through slurm and use srun to launch the job, I get these > errors: > > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [dawson029.pppl.gov:26070] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [dawson029.pppl.gov:26076] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > > If I run the same job, but use mpiexec or mpirun instead of srun, the jobs > run just fine. I checked ompi_info to make sure OpenMPI was compiled with > Slurm support: > > $ ompi_info | grep slurm > Configure command line: '--prefix=/usr/pppl/intel/2019-pkgs/openmpi-4.0.3' > '--disable-silent-rules' '--enable-shared' '--with-pmix=internal' > '--with-slurm' '--with-psm' > MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.3) > MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) > MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) > MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.3) > > Any ideas what could be wrong? Do you need any additional information? > > Prentice >