Charles, Having implemented some of the underlying collective algorithms, I am puzzled by the need to force the sync to 1 to have things flowing. I would definitely appreciate a reproducer so that I can identify (and hopefully) fix the underlying problem.
Thanks, George. On Tue, Oct 29, 2019 at 2:20 PM Garrett, Charles via users < users@lists.open-mpi.org> wrote: > Last time I did a reply on here, it created a new thread. Sorry about > that everyone. I just hit the Reply via email button. Hopefully this one > will work. > > > > To Gilles Gouaillardet: > > My first thread has a reproducer that causes the problem. > > > > To Beorge Bosilca: > > I had to set coll_sync_barrier_before=1. Even setting to 10 did not fix > my problem. I was surprised by this and I’m still surprised given your > comment on setting to larger than a few tens. Thanks for the explanation > about the problem. > > > > Charles Garrett >