I'd recommend against using Open MPI v3.1.0 -- it's quite old.  If you have to 
use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which has all the 
rolled-up bug fixes on the v3.1.x series.
That being said, Open MPI v4.1.2 is the most current.  Open MPI v4.1.2 does 
restrict which versions of UCX it uses because there are bugs in the older 
versions of UCX.  I am not intimately familiar with UCX -- you'll need to ask 
Nvidia for support there -- but I was under the impression that it's just a 
user-level library, and you could certainly install your own copy of UCX to use 
with your compilation of Open MPI.  I.e., you're not restricted to whatever UCX 
is installed in the cluster system-default locations.

I don't know why you're getting MXM-specific error messages; those don't appear 
to be coming from Open MPI (especially since you configured Open MPI with 
--without-mxm).  If you can upgrade to Open MPI v4.1.2 and the latest UCX, see 
if you are still getting those MXM error messages.

--
Jeff Squyres
jsquy...@cisco.com

________________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Angel de Vicente 
via users <users@lists.open-mpi.org>
Sent: Friday, February 18, 2022 5:46 PM
To: Gilles Gouaillardet via users
Cc: Angel de Vicente
Subject: Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

Hello,

Gilles Gouaillardet via users <users@lists.open-mpi.org> writes:

> Infiniband detection likely fails before checking expanded verbs.

thanks for this. At the end, after playing a bit with different options,
I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted
4.1.1, but that would not compile cleanly with the old version of UCX
that is installed in the cluster). The configure command line (as
reported by ompi_info) was:

,----
|   Configure command line: 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-3.1.0-g5a7szwxcsgmyibqvwwavfkz5b4i2ym7'
|                           '--enable-shared' '--disable-silent-rules'
|                           '--disable-builtin-atomics' '--with-pmi=/usr'
|                           
'--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz'
|                           '--without-knem' '--with-hcoll=/opt/mellanox/hcoll'
|                           '--without-psm' '--without-ofi' '--without-cma'
|                           '--with-ucx=/opt/ucx' '--without-fca'
|                           '--without-mxm' '--without-verbs' '--without-xpmem'
|                           '--without-psm2' '--without-alps' '--without-lsf'
|                           '--without-sge' '--with-slurm' '--without-tm'
|                           '--without-loadleveler' '--disable-memchecker'
|                           
'--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-1.11.13-kpjkidab37wn25h2oyh3eva43ycjb6c5'
|                           '--disable-java' '--disable-mpi-java'
|                           '--without-cuda' '--enable-wrapper-rpath'
|                           '--disable-wrapper-runpath' '--disable-mpi-cxx'
|                           '--disable-cxx-exceptions'
|                           
'--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_\
| 64-pc-linux-gnu/9.3.0
|                           
-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64'
`----


The versions that I'm using are:

gcc:   9.3.0
mxm:   3.6.3102      (though I configure OpenMPI --without-mxm)
hcoll: 3.8.1649
knem:  1.1.2.90mlnx2 (though I configure OpenMPI --without-knem)
ucx:   1.2.2947
slurm: 18.08.7


It looks like everything executes fine, but I have a couple of warnings,
and I'm not sure how much I should worry and what I could do about them:

1) Conflicting CPU frequencies detected:

[1645221586.038838] [s01r3b78:11041:0]         sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 3151.41
[1645221585.740595] [s01r3b79:11484:0]         sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 2998.76

2) Won't use knem. In a previous try, I was specifying --with-knem, but
I was getting this warning about not being able to open /dev/knem. I
guess our cluster is not properly configured w.r.t knem, so I built
OpenMPI again --without-knem, but I still get this message?

[1645221587.091122] [s01r3b74:9054 :0]         shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.
[1645221587.104807] [s01r3b76:8610 :0]         shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.


Any help/pointers appreciated. Many thanks,
--
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
---------------------------------------------------------------------------------------------
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer

Reply via email to