Hello,
We're running into issues with jobs failing in a non-deterministic way when
running multiple jobs concurrently within a "make test" framework.
Make test is launched from within a shell script running inside a Podman
container, and we're typically running with "-j 20" and "-np 4" (20 jobs
Greg,
If Open MPI was built with UCX, your jobs will likely use UCX (and the
shared memory provider) even if running on a single node.
You can
mpirun --mca pml ob1 --mca btl self,sm ...
if you want to avoid using UCX.
What is a typical mpirun command line used under the hood by your "make
test"?