Charles,
There is a known issue with calling collectives on a tight loop, due to
lack of control flow at the network level. It results in a significant
slow-down, that might appear as a deadlock to users. The work around this
is to enable the sync collective module, that will insert a fake barrier
Last time I did a reply on here, it created a new thread. Sorry about that
everyone. I just hit the Reply via email button. Hopefully this one will work.
To Gilles Gouaillardet:
My first thread has a reproducer that causes the problem.
To Beorge Bosilca:
I had to set coll_sync_barrier_before=
Charles,
Having implemented some of the underlying collective algorithms, I am
puzzled by the need to force the sync to 1 to have things flowing. I would
definitely appreciate a reproducer so that I can identify (and hopefully)
fix the underlying problem.
Thanks,
George.
On Tue, Oct 29, 2019
Hi,
We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important
behavior change with respect to above option.
We invoke mpirun as
% mpirun -output-filename /app.log -np
With 2.0.1, the above produced /app.log. file for stdout of
the application, where is the rank of the pro
On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users
mailto:users@lists.open-mpi.org>> wrote:
Hi,
We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important
behavior change with respect to above option.
We invoke mpirun as
% mpirun –output-filename /app.log –np
With 2