Is Slurm built with PMIx support? Did you tell srun to use it?

> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users 
> <users@lists.open-mpi.org> wrote:
> 
> I'm using OpenMPI 4.0.3 with Slurm 19.05.5  I'm testing the software with a 
> very simple hello, world MPI program that I've used reliably for years. When 
> I submit the job through slurm and use srun to launch the job, I get these 
> errors:
> 
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [dawson029.pppl.gov:26070] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not able to 
> guarantee that all other processes were killed!
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [dawson029.pppl.gov:26076] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not able to 
> guarantee that all other processes were killed!
> 
> If I run the same job, but use mpiexec or mpirun instead of srun, the jobs 
> run just fine. I checked ompi_info to make sure OpenMPI was compiled with  
> Slurm support:
> 
> $ ompi_info | grep slurm
>   Configure command line: '--prefix=/usr/pppl/intel/2019-pkgs/openmpi-4.0.3' 
> '--disable-silent-rules' '--enable-shared' '--with-pmix=internal' 
> '--with-slurm' '--with-psm'
>                  MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.3)
>                  MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3)
>                  MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3)
>               MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.3)
> 
> Any ideas what could be wrong? Do you need any additional information?
> 
> Prentice
> 


Reply via email to