Hi Ole,

we found that UCX isn't very useful not performant on OmniPath, so if your
compiled isn't used on both InfiniBand and OmniPath you can compile OpenMPI
using "eb --filter-deps=UCX ..."
Open MPI works well there either using libpsm2 directly (using the "cm" pml
and "psm2" mtl), or via libfabric (using the same "cm" pml and the "ofi"
mtl)

We use the same Open MPI binaries on multiple clusters but set this on
OmniPath:
OMPI_MCA_btl='^openib'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI it
will not compile in "openib" so the first one of those three would not be
needed.

Regards,
Bart

On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <[email protected]>
wrote:

> Hi Åke,
>
> On 12/3/21 08:27, Åke Sandgren wrote:
> >> On 02-12-2021 14:18, Åke Sandgren wrote:
> >>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:
> >>>> These are updated observations of running OpenMPI codes with an
> >>>> Omni-Path network fabric on AlmaLinux 8.5::
> >>>>
> >>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my trivial
> >>>> MPI test code works correctly:
> >>>>
> >>>> $ ml OpenMPI
> >>>> $ ml
> >>>>
> >>>> Currently Loaded Modules:
> >>>>     1) GCCcore/11.2.0                     9)
> hwloc/2.5.0-GCCcore-11.2.0
> >>>>     2) zlib/1.2.11-GCCcore-11.2.0        10) OpenSSL/1.1
> >>>>     3) binutils/2.37-GCCcore-11.2.0      11)
> >>>> libevent/2.1.12-GCCcore-11.2.0
> >>>>     4) GCC/11.2.0                        12) UCX/1.11.2-GCCcore-11.2.0
> >>>>     5) numactl/2.0.14-GCCcore-11.2.0     13)
> >>>> libfabric/1.13.2-GCCcore-11.2.0
> >>>>     6) XZ/5.2.5-GCCcore-11.2.0           14) PMIx/4.1.0-GCCcore-11.2.0
> >>>>     7) libxml2/2.9.10-GCCcore-11.2.0     15) OpenMPI/4.1.1-GCC-11.2.0
> >>>>     8) libpciaccess/0.16-GCCcore-11.2.0
> >>>>
> >>>> $ mpicc mpi_test.c
> >>>> $ mpirun -n 2 a.out
> >>>>
> >>>> (null): There are 2 processes
> >>>>
> >>>> (null): Rank  1:  d008
> >>>>
> >>>> (null): Rank  0:  d008
> >>>>
> >>>>
> >>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still gives
> >>>> the error messages:
> >>>>
> >>>> $ ml OpenMPI/4.1.0-GCC-10.2.0
> >>>> $ ml
> >>>>
> >>>> Currently Loaded Modules:
> >>>>     1) GCCcore/10.2.0               3) binutils/2.35-GCCcore-10.2.0
> 5)
> >>>> numactl/2.0.13-GCCcore-10.2.0   7) libxml2/2.9.10-GCCcore-10.2.0
> 9)
> >>>> hwloc/2.2.0-GCCcore-10.2.0      11) UCX/1.9.0-GCCcore-10.2.0
> 13)
> >>>> PMIx/3.1.5-GCCcore-10.2.0
> >>>>     2) zlib/1.2.11-GCCcore-10.2.0   4) GCC/10.2.0
> 6)
> >>>> XZ/5.2.5-GCCcore-10.2.0         8) libpciaccess/0.16-GCCcore-10.2.0
> 10)
> >>>> libevent/2.1.12-GCCcore-10.2.0  12) libfabric/1.11.0-GCCcore-10.2.0
> 14)
> >>>> OpenMPI/4.1.0-GCC-10.2.0
> >>>>
> >>>> $ mpicc mpi_test.c
> >>>> $ mpirun -n 2 a.out
> >>>> [1638449983.577933] [d008:910356:0]       ib_iface.c:966  UCX  ERROR
> >>>> ibv_create_cq(cqe=4096) failed: Operation not supported
> >>>> [1638449983.577827] [d008:910355:0]       ib_iface.c:966  UCX  ERROR
> >>>> ibv_create_cq(cqe=4096) failed: Operation not supported
> >>>> [d008.nifl.fysik.dtu.dk:910355] pml_ucx.c:273  Error: Failed to
> create
> >>>> UCP worker
> >>>> [d008.nifl.fysik.dtu.dk:910356] pml_ucx.c:273  Error: Failed to
> create
> >>>> UCP worker
> >>>>
> >>>> (null): There are 2 processes
> >>>>
> >>>> (null): Rank  0:  d008
> >>>>
> >>>> (null): Rank  1:  d008
> >>>>
> >>>> Conclusion: The foss-2021b toolchain with OpenMPI/4.1.1-GCC-11.2.0
> seems
> >>>> to be required on systems with an Omni-Path network fabric on
> AlmaLinux
> >>>> 8.5.  Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's
> >>>> needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b.
> >>>>
> >>>> Does anyone have comments on this?
> >>>
> >>> UCX is the problem here in combination with libfabric I think. Write a
> >>> hook that upgrades the version of UCX to 1.11-something if it's <
> >>> 1.11-ish, or just that specific version if you have older-and-working
> >>> versions.
> >>
> >> You are right that the nodes with Omni-Path have different libfabric
> >> packages which come from the EL8.5 BaseOS as well as the latest
> >> Cornelis/Intel Omni-Path drivers:
> >>
> >> $ rpm -qa | grep libfabric
> >> libfabric-verbs-1.10.0-2.x86_64
> >> libfabric-1.12.1-1.el8.x86_64
> >> libfabric-devel-1.12.1-1.el8.x86_64
> >> libfabric-psm2-1.10.0-2.x86_64
> >>
> >> The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis.
> >>
> >> Regarding UCX, I was first using the trusted foss-2020b toolchain which
> >> includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with
> >> the toolchains?
> >>
> >> The foss-2021b toolchain includes the newer UCX 1.11, which seems to
> >> solve this particular problem.
> >>
> >> Can we make any best practices recommendations from these observations?
> >
> > I didn't check properly, but UCX does not depend on libfabric, OpenMPI
> > does, so I'd write a hook that replaces libfabric < 1.12 with at least
> > 1.12.1.
> > Sometimes you just have to mess with the toolchains, and this looks like
> > one of those situations.
> >
> > Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses)
> > with an updated libfabric and check if that fixes the problem. And if it
> > does, write a hook that replaces libfabric. See the framework/contrib
> > for examples, I did that for UCX so there is code there to show you how.
>
> I don't feel qualified to mess around with modifying EB toolchains...
>
> The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to solve
> the present problem.  Do you think there are any disadvantages with asking
> users to go for foss-2021b?  Of course we may need several modules to be
> upgraded from foss-2020b to foss-2021b.
>
> Another possibility may be the coming driver upgrade from Cornelis
> Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5.  I'm
> definitely going to check this when it becomes available.
>
> Thanks,
> Ole
>


-- 
Dr. Bart E. Oldeman | [email protected] | [email protected]
Scientific Computing Analyst / Analyste en calcul scientifique
McGill HPC Centre / Centre de Calcul Haute Performance de McGill |
http://www.hpc.mcgill.ca
Calcul Québec | http://www.calculquebec.ca
Compute/Calcul Canada | http://www.computecanada.ca
Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934

Reply via email to