Hi Gilles,
Thanks for your assistance.
I tried the recommended settings but got an error saying “sm” is no longer
available in Open MPI 3.0+, and to use “vader” instead. I then tried with
“--mca pml ob1 --mca btl self,vader” but ended up with the original error:
[podman-ci-rocky-8.8:09900] MC
Good afternoon MPI fans of all ages,
Yet again, I'm getting an error that I'm having trouble interpreting. This
time, I'm trying to run ior. I've done it a thousand times but not on an
NVIDIA DGX A100 with multiple NICs.
The ultimate command is the following:
/cm/shared/apps/openmpi4/gcc/4.1.5/
Hi Jeffrey,
I would suggest trying to debug what may be going wrong with UCX on your DGX
box.
There are several things to try from the UCX faq -
https://openucx.readthedocs.io/en/master/faq.html
I’d suggest setting the UCX_LOG_LEVEL environment variable to info or debug and
see if UCX says so