Hi,

I'm back with these deployments of Open-MPI as I need some explanation with the 5.0.5 version.

First, what I have done with OpenMPI and my old qlogic infiniband network with Almalinux 9 operating system, Slurm and Gcc 14.2:

- I've downloaded and built the psm library from https://github.com/pdlfs/psm.git. This fork is maintained whilst I was not able to compile the code from the official archived git repo. - I've noticed bad modes on /dev/ipath* files: they were 0600 instead of 0666. I correct this with a rule in udev. - I've build an old OpenMPI version 3.1.x from an archive I have created from the git repo on 2018/10/10 as this is the version running in production on the old cluster with CentOS6. The old cluster uses a software package provided by Intel for the modules and the libraries. I add the --with-psm=/my/psm/dir option and the TrueSacale network works fine with Almalinux9 + GCC 14.2 + my psm build (same performances with osu_bw, osu_bibw...). - I've build OpenMPI 4.1.6 with UCX 1.17 and my psm build (I add the --with-psm=/my/psm/dir option) and now it works fine too. Just have to set " OMPI_MCA_btl_openib_allow_ib=true" to avoid some warning messages like "There was an error initializing an OpenFabrics device." for qib0.

1) But Open-MPI 5.0.5 do not provides any more the --with-psm option, so I understand that now it is UCX that provides PSM. Right ? 2) So my UCX build is not correct as OpenMPI 4.1.6, built with UCX, doesn't work with  if I remove the --with-psm option. Still right ? 3) I've also understood that nor psm2 (for omni-path) nor psm3 are backward compatible with PSM (Truescale). 4) From https://hpc.guix.info/blog/2019/12/optimized-and-portable-open-mpi-packaging/ I understand that now I must use libfabric and libfabric-devel (1.18.0-1.el9 available but not installed at this time) but looking at the rpms contents I only see psm2 and psm3 references and not psm.

Any advices to build OpenMPI 5.05 with UCX in the jungle of infiniband for my Truescale network is welcome.

Patrick


Le 30/09/2024 à 18:41, Patrick Begou via users a écrit :
Hi Nathan

thanks for this suggestion. I have understood that now all is managed by the UCX layer. Am I wrong ? These options do not seams to work with my openMPI 5.0.5 build. But I've built OpenMPI on the cluster front-end and it had no HBA at this  time. I've added one this evening (an old spare one I had) and the software to manage it, may be should I rebuild OpenMPI ?

bash-5.1$ mpirun hostname
kareline-0-0.localcluster.priv
kareline-0-1.localcluster.priv

bash-5.1$ mpirun --mca pml cm --mca mtl psm2 osu_bw
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host:      kareline-0-0
Framework: mtl
Component: psm2
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      kareline-0-0
  Framework: pml
--------------------------------------------------------------------------

bash-5.1$ which mpirun
/opt/GCC14/OpenMPI/5.0.5/bin/mpirun
bash-5.1$


Patrick

Le 30/09/2024 à 18:26, Nathan Hjelm a écrit :
If this is a QLogic system why not try psm2 (--mca pml cm --mca mtl psm2)? Not sure how good UCX support is over these systems and psm2 is the vendor's library.

Not sure what the right link is to the current version but found this version:

GitHub - cornelisnetworks/opa-psm2
GitHub - cornelisnetworks/opa-psm2
github.com
<https://github.com/cornelisnetworks/opa-psm2>


-Nathan


On Sep 30, 2024, at 10:18 AM, Patrick Begou via users <users@lists.open-mpi.org> wrote:


Hi,

I'm working on refreshing an old cluster with Almalinux 9 (instead of CentOS6 😕) and building a fresh OpenMPI 5.0.5 environment. I've reached the step where OpenMPI begins to work with ucx 1.17 and Pmix 5.0.3 but not totally. Nodes are using a Qlogic QDR HBA with a managed Qlogic switch (40Gb/s) and 1Gb/s ethernet and I've a limited knowledge with all the software stack required now with ucx for this hardware.

This is the output of osu_bw test between 2 nodes (in slurm context)

bash-5.1$ mpirun --mca pml ucx --mca osc ucx --mca scoll ucx --mca atomic ucx osu_bw # OSU MPI Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size      Bandwidth (MB/s) 1                       0.30 2                       0.59 4 1.16 8                       2.33 16                      4.78 32 9.46 64                     18.80 128 36.21 256                    69.61 512 142.48 1024                  256.41 2048                  498.27 4096 719.19 8192                 1010.86 16384 1416.17 32768                1935.44 65536                2509.17 131072 2786.79 262144               2401.26 524288                500.32 1048576 854.12 2097152              3114.28 4194304 1830.78

The options come from https://docs.open-mpi.org/en/main/tuning-apps/networking/ib-and-roce.html, without them it uses the slow ethernet 1Gb/s interface. Running the osu_bibw test is worse as soon as the size of the messages increase like if some congestion occurs. # OSU MPI Bi-Directional Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size Bandwidth (MB/s) 1 0.52 2 1.04 4 2.08 8 4.18 16 8.37 32 16.76 64 33.11 128 65.93 256 130.89 512 248.77 1024 492.23 2048 1024.23 4096 1622.98 8192 2352.29 16384 1724.83 32768 2309.67 65536 2538.13 131072 2586.15 262144 95.93 524288 42.83 1048576 63.14 2097152 78.81 4194304 129.66

1) I've built ucx 1.17.0 with the gcc 11.4 provided by the OS as I need a thread safe version (suggested by Gilles Gouaillardet when I was building UCX for OpenMPI 4.04 on another cluster with HDR100 and have some performances troubles) ../ucx/contrib/configure-release --enable-mt

2) I've built a fresh version of PMIX 5.0.3 with the gcc 11.4 provided by the OS without specific options: prefix=/usr build_srpm=yes build_multiple=yes ./buildrpm.sh ../../pmix-5.0.3.tar.bz2 3) slurm is built with PMIX and UCX with the gcc 11.4 provided by the OS

3) Then I've built OpenMPI with a fresh install of gcc 14.2 (to have a correct version of the fortran module) Configure command line: '--enable-mpirun-prefix-by-default' '--prefix=/opt/GCC14/OpenMPI/5.0.5' '--enable-mpi1-compatibility' '--with-slurm'

PATH and LD_LIBRARY_PATH are set via the module environment tool.


Using the old deployment of this cluster (same Qlogiq HBA and IB switch) based on openMPI 3.1.3rc1 with openib and gcc 7.3, it works fine. Configure command line: '--prefix=/share/apps/GCC73/openmpi/31-patch' '--enable-mpirun-prefix-by-default' '--disable-dlopen' '--enable-mpi-cxx' '--without-slurm' '--enable-mpi-thread-multiple'

# OSU MPI Bi-Directional Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size Bandwidth (MB/s) 1 1.93 .... 1048576 6034.23 2097152 6028.31 4194304 6033.63

The basic Almalinux packages deployed to manage the infiniband network are: - kernel-lt => required for the ib_qib module that is not available with Almalinux9 - kernel-lt-devel - infiniband-diags - libibumad - rdma-core - ib_qib-ibverbs

My ucx threadsafe packages deployed: - ucx-threadsafe-1.17.0-1.el9.x86_64 - ucx-threadsafe-devel-1.17.0-1.el9.x86_64 - ucx-threadsafe-ib-1.17.0-1.el9.x86_64 - ucx-threadsafe-rdmacm-1.17.0-1.el9.x86_64 - ucx-threadsafe-cma-1.17.0-1.el9.x86_64

May be I'm wrong there too.

Thanks all for your help. Patrick



Reply via email to