I don't know how openmpi does it, but I've definitely seen packages where "make
clean" wipes the ".o" files but not the results of the configure process.
Sometimes there's a "make distclean" which tries to get back closer to
as-untarred state.
Noam
On Jul 18, 2023, at 12:51 PM, Jeffrey Layton
Looks like the issue is that mpi4py by default uses THREAD_MULTIPLE, which ucx
does not support. It would be nice if the OpenMPI pml selection code provided
information on what exactly caused ucx initialization to fail, but at least I
know how to work around my problem now.
I've been happily using OpenMPI 4.1.4 for a while, but I've run into a weird
new problem. I mainly use it with ucx, typically running with the mpirun flags
--bind-to core --report-bindings --mca pml ucx --mca osc ucx --mca btl
^vader,tcp,openib
and with our compiled Fortran codes it seems to work
I don't think it's inherently true that multiple mpiruns interfere with each
other - we do that routinely, I thought. Any chance that your jobs are doing
something like writing to a common directory (like /tmp), and then interfering
with each other?
As an aside, you should consider the mpirun
Stdout from every process is gathered by mpirun and shown on in stdout of the
shell where mpirun started. There's a command line option for mpirun to label
lines by the MPI task, "--tag-output" I think. There's some OpenMP function
you can use to determine the current OpenMP thread number whic
Yeah, that appears to have been the issue - IB is entirely dead (it's a new
machine, so maybe no subnet manager, or maybe a bad cable). I'll track that
down, and follow up here if there's still an issue once the low level IB
problem is fixed.
However, given that ucx says it supports shared memo
Here is more information with higher verbosity:
> mpirun -np 2 --mca pml ucx --mca osc ucx --bind-to core --map-by core
> --rank-by core --mca pml_ucx_verbose 100 --mca osx_ucxv_erbose 100 --mca
> bml_base_verbose 100 mpi_executable
[tin2:1137672] mca: base: components_register: registering fram
Hi all - I'm trying to get openmpi with ucx working on a new Rocky Linux 8 +
OpenHPC machine. I'm used to running with
mpirun --mca pml ucx --mca osc ucx --mca btl ^vader,tcp,openib --bind-to core
--map-by core --rank-by core
However, now it complains that it can't start the pml, with the message
Hi - I'm trying to run multi-node mixed OpenMP/MPI with each MPI task bound to
a set of cores. I thought this would be relatively straightforward with
"--map-by slot:PE=$OMP_NUM_THREADS --bind-to core", but I can't get it to work.
I couldn't figure out if it was a bug or just something missin