All, This is probably going to be a very basic question, but I find the need to ask. Recently the cluster I use installed UCX and PMIx, which is nice. Now I'm currently trying to build a stack of Open MPI 4.0.0 with the ability to see those, but until then I thought I'd try Intel MPI based on https://slurm.schedmd.com/mpi_guide.html#intel_mpi
First, SLURM does seem to see PMIx: (1041)(master) $ srun --version srun: cluster configuration lacks support for cpu binding slurm 17.11.12 (1042)(master) $ srun --mpi=list srun: cluster configuration lacks support for cpu binding srun: MPI types are... srun: pmi2 srun: none srun: openmpi srun: pmix srun: pmix_v2 And I can run fine with mpirun (I've already salloc'd some nodes) and this is always how I run with Intel MPI: (1051)(master) $ mpirun -np 4 ./helloWorld.mpi3.SLES12.IMPI.exe Compiler Version: Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.1.144 Build 20181018 MPI Version: 3.1 MPI Library Version: Intel(R) MPI Library 2019 Update 1 for Linux* OS Process 0 of 4 is on borgc129 Process 1 of 4 is on borgc129 Process 2 of 4 is on borgc129 Process 3 of 4 is on borgc129 But I seem to have issues when I try to use Intel MPI and srun it just halts for a minute or so with: (1059)(master) $ env I_MPI_PMI_LIBRARY=/usr/nlocal/pmix/2.1/lib64/libpmi2.so srun -n 4 ./helloWorld.mpi3.SLES12.IMPI.exe srun: cluster configuration lacks support for cpu binding srun: Warning: can't run 4 processes on 8 nodes, setting nnodes to 4 and then I see: srun: Job 36007416 step creation temporarily disabled, retrying So I'm doing something dumb, obviously, but do you know what? Thanks, Matt -- Matt Thompson, SSAI, Sr Scientific Programmer/Analyst NASA GSFC, Global Modeling and Assimilation Office Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 Phone: 301-614-6712 Fax: 301-614-6246 http://science.gsfc.nasa.gov/sed/bio/matthew.thompson