HI, 

I’m having some difficulties building a working OpenMPI configuration for an 
infiniband cluster. 

My configuration has been built with GCC 9.3.0 and is configured like so: 
  '--prefix=/opt/mpi/openmpi/4.0.4/gnu/9.3.0' '--with-slurm' '--enable-shared' 
'--with-pmi' 'CC=/opt/gnu/gcc/9.3.0/bin/gcc' 'CPP=/opt/gnu/gcc/9.3.0/bin/cpp' 
'CXX=/opt/gnu/gcc/9.3.0/bin/g++' 'FC=/opt/gnu/gcc/9.3.0/bin/gfortran' 
'--with-psm’

We use Slurm as scheduler and when running an application (srun 
./<application>) I get the following error: 

--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:              nymph2
  Local adapter:           qib0
  Local port:              1

—————————————————————————————————————

I have tried setting this MCA parameter (mpirun --mca btl_openib_allow_ib true 
./<application>) which then results in the error: 

nymph2.1318PSM can't open /dev/ipath for reading and writing (err=23)
nymph2.1320PSM can't open /dev/ipath for reading and writing (err=23)
nymph2.1319PSM can't open /dev/ipath for reading and writing (err=23)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Failure in initializing endpoint
—————————————————————————————————————

The cluster hardware is QLogic infiniband with Intel CPUs. My understanding is 
that we should be using the old PSM for networking. 

Any thoughts what might be going wrong with the build? 

Many Thanks, 

Dean 

Reply via email to