Charles,

There is a known issue with calling collectives on a tight loop, due to
lack of control flow at the network level. It results in a significant
slow-down, that might appear as a deadlock to users. The work around this
is to enable the sync collective module, that will insert a fake barrier at
regular intervals in the tight collective loop, allowing a more streamlined
usage of the network.

Run `ompi_info --param coll sync -l 9` to see the options you need to play
with. I think setting one of the coll_sync_barrier_before
or coll_sync_barrier_after to anything larger than a few tens should be
good enough.

  George.


On Mon, Oct 28, 2019 at 9:29 PM Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Charles,
>
>
> unless you expect yes or no answers, can you please post a simple
> program that evidences
>
> the issue you are facing ?
>
>
> Cheers,
>
>
> Gilles
>
> On 10/29/2019 6:37 AM, Garrett, Charles via users wrote:
> >
> > Does anyone have any idea why this is happening?  Has anyone seen this
> > problem before?
> >
>

Reply via email to