HI, I’m having some difficulties building a working OpenMPI configuration for an infiniband cluster.
My configuration has been built with GCC 9.3.0 and is configured like so: '--prefix=/opt/mpi/openmpi/4.0.4/gnu/9.3.0' '--with-slurm' '--enable-shared' '--with-pmi' 'CC=/opt/gnu/gcc/9.3.0/bin/gcc' 'CPP=/opt/gnu/gcc/9.3.0/bin/cpp' 'CXX=/opt/gnu/gcc/9.3.0/bin/g++' 'FC=/opt/gnu/gcc/9.3.0/bin/gfortran' '--with-psm’ We use Slurm as scheduler and when running an application (srun ./<application>) I get the following error: -------------------------------------------------------------------------- By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default. The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true. Local host: nymph2 Local adapter: qib0 Local port: 1 ————————————————————————————————————— I have tried setting this MCA parameter (mpirun --mca btl_openib_allow_ib true ./<application>) which then results in the error: nymph2.1318PSM can't open /dev/ipath for reading and writing (err=23) nymph2.1320PSM can't open /dev/ipath for reading and writing (err=23) nymph2.1319PSM can't open /dev/ipath for reading and writing (err=23) -------------------------------------------------------------------------- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Failure in initializing endpoint ————————————————————————————————————— The cluster hardware is QLogic infiniband with Intel CPUs. My understanding is that we should be using the old PSM for networking. Any thoughts what might be going wrong with the build? Many Thanks, Dean