Assuming a correct implementation the described communication pattern
should work seamlessly.
Would it be possible to either share a reproducer or provide the execution
stack by attaching a debugger to the deadlocked application to see the
state of the different processes. I wonder if all processes join eventually
the gather on comm_world or dinner of them are stuck on some orthogonal
collective communication pattern.


On Fri, Sep 9, 2022, 21:24 Niranda Perera via users <> wrote:

> Hi all,
> I have the following use case. I have N mpi ranks in the global
> communicator, and I split it into two, first being rank 0, and the other
> being all ranks from 1-->N-1.
> Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to
> broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world.
> Rank0 then immediately calls a gather (blocking) over comm_world and
> busywait for results. Once the broadcast is received by workers, they call
> a method foo(args, local_comm). Inside foo, workers communicate with each
> other using the subcommunicator, and each produce N-1 results, which would
> be sent to Rank0 as gather responses over comm_world. Inside foo there are
> multiple iterations, collectives, send-receives, etc.
> This seems to be working okay with smaller parallelism and smaller tasks
> of foo. But when the parallelism increases (eg: 64... 512), only a single
> iteration completes inside foo. Subsequent iterations, seems to be hanging.
> Is this an anti-pattern in MPI? Should I use igather, ibcast instead of
> blocking calls?
> Any help is greatly appreciated.
> --
> Niranda Perera
> @n1r44 <>

Reply via email to