Hi,
There are a few things that you could test to see whether they make difference.
1. Try to modify the number of aggregators used in collective I/O (assuming
that the code uses collective I/O). You could try e.g. to set it to the number
of nodes used (the algorithm determining the number
[AMD Official Use Only - General]
I can also offer to help if there are any question regarding the ompio code,
but I do not have the bandwidth/resources to do that myself, and more
importantly, I do not have a platform to test the new component.
Edgar
From: users On Behalf Of Jeff Squyres
(js
There was work done in ompio in that direction, but the code wasn’t actually
committed into the main repository. It probably exists somewhere in a branch
somewhere. If you are interested, please ping me directly and I can put you in
contact with the person that wrote the code and to clarify the
[AMD Official Use Only - General]
UCX will disqualify itself unless it finds cuda, rocm, or InfiniBand network to
use. To allow UCX to run on a regular shared memory job without GPUs or IB, you
have to set UCX_TLS environment variable explicitly allowe UCX to run for shm,
e.g :
mpirun -x UCX_T