did you happen to get 4.7.1 which comes with ucx-1.7.0-1.47100
compiled again openmpi 4.0.2?
i got snagged by this
https://github.com/open-mpi/ompi/issues/7128
which i thought would have had the fixes merged into the v4.0.2 tag,
but it doesn't seem so in my case
On Fri, Feb 7, 2020 at 11:34 AM
Were using MLNX_OFED 4.7.3. It supplies UCX 1.7.0.
We have OpenMPI 4.02 compiled against the Mellanox OFED 4.7.3 provided versions of UCX, KNEM and
HCOLL, along with HWLOC 2.1.0 from the OpenMPI site.
I mirrored the build to be what Mellanox used to configure OpenMPI in HPC-X 2.5.
I have user
i haven't compiled openmpi in a while, but i'm in the process of
upgrading our cluster.
the last time i did this there were specific versions of mpi/pmix/ucx
that were all tested and supposed to work together. my understanding
of this was because pmi/ucx was under rapid development and the api's