Hi,
I'm working on refreshing an old cluster with Almalinux 9 (instead of
CentOS6 😕) and building a fresh OpenMPI 5.0.5 environment. I've reached
the step where OpenMPI begins to work with ucx 1.17 and Pmix 5.0.3 but
not totally. Nodes are using a Qlogic QDR HBA with a managed Qlogic
switch (40Gb/s) and 1Gb/s ethernet and I've a limited knowledge with all
the software stack required now with ucx for this hardware.
This is the output of osu_bw test between 2 nodes (in slurm context)
bash-5.1$ mpirun --mca pml ucx --mca osc ucx --mca scoll ucx --mca
atomic ucx osu_bw # OSU MPI Bandwidth Test v7.4 # Datatype: MPI_CHAR. #
Size     Bandwidth (MB/s) 1                      0.30
2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0.59 4Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1.16
8Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2.33 16Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 4.78
32Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 9.46 64Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 18.80
128Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 36.21 256Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 69.61
512Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 142.48 1024Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 256.41
2048Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 498.27 4096Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 719.19
8192Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1010.86 16384Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1416.17
32768Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1935.44 65536Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2509.17
131072Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2786.79 262144Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2401.26
524288Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 500.32 1048576Â Â Â Â Â Â Â Â Â Â Â Â Â Â 854.12
2097152Â Â Â Â Â Â Â Â Â Â Â Â Â 3114.28 4194304Â Â Â Â Â Â Â Â Â Â Â Â Â 1830.78
The options come from
https://docs.open-mpi.org/en/main/tuning-apps/networking/ib-and-roce.html,
without them it uses the slow ethernet 1Gb/s interface. Running the
osu_bibw test is worse as soon as the size of the messages increase like
if some congestion occurs. # OSU MPI Bi-Directional Bandwidth Test v7.4
# Datatype: MPI_CHAR. # Size Bandwidth (MB/s) 1 0.52 2 1.04 4 2.08 8
4.18 16 8.37 32 16.76 64 33.11 128 65.93 256 130.89 512 248.77 1024
492.23 2048 1024.23 4096 1622.98 8192 2352.29 16384 1724.83 32768
2309.67 65536 2538.13 131072 2586.15 262144 95.93 524288 42.83 1048576
63.14 2097152 78.81 4194304 129.66
1) I've built ucx 1.17.0 with the gcc 11.4 provided by the OS as I need
a thread safe version (suggested by Gilles Gouaillardet when I was
building UCX for OpenMPI 4.04 on another cluster with HDR100 and have
some performances troubles) ../ucx/contrib/configure-release --enable-mt
2) I've built a fresh version of PMIX 5.0.3 with the gcc 11.4 provided
by the OS without specific options: prefix=/usr build_srpm=yes
build_multiple=yes ./buildrpm.sh ../../pmix-5.0.3.tar.bz2 3) slurm is
built with PMIX and UCX with the gcc 11.4 provided by the OS
3) Then I've built OpenMPI with a fresh install of gcc 14.2 (to have a
correct version of the fortran module) Configure command line:
'--enable-mpirun-prefix-by-default' '--prefix=/opt/GCC14/OpenMPI/5.0.5'
'--enable-mpi1-compatibility' '--with-slurm'
PATH and LD_LIBRARY_PATH are set via the module environment tool.
Using the old deployment of this cluster (same Qlogiq HBA and IB switch)
based on openMPI 3.1.3rc1 with openib and gcc 7.3, it works fine.
Configure command line: '--prefix=/share/apps/GCC73/openmpi/31-patch'
'--enable-mpirun-prefix-by-default' '--disable-dlopen'
'--enable-mpi-cxx' '--without-slurm' '--enable-mpi-thread-multiple'
# OSU MPI Bi-Directional Bandwidth Test v7.4 # Datatype: MPI_CHAR. #
Size Bandwidth (MB/s) 1 1.93 .... 1048576 6034.23 2097152 6028.31
4194304 6033.63
The basic Almalinux packages deployed to manage the infiniband network
are: - kernel-lt => required for the ib_qib module that is not available
with Almalinux9 - kernel-lt-devel - infiniband-diags - libibumad -
rdma-core - ib_qib-ibverbs
My ucx threadsafe packages deployed: -
ucx-threadsafe-1.17.0-1.el9.x86_64 -
ucx-threadsafe-devel-1.17.0-1.el9.x86_64 -
ucx-threadsafe-ib-1.17.0-1.el9.x86_64 -
ucx-threadsafe-rdmacm-1.17.0-1.el9.x86_64 -
ucx-threadsafe-cma-1.17.0-1.el9.x86_64
May be I'm wrong there too.
Thanks all for your help. Patrick