Assuming a correct implementation the described communication pattern should work seamlessly.
Would it be possible to either share a reproducer or provide the execution stack by attaching a debugger to the deadlocked application to see the state of the different processes. I wonder if all processes join eventually the gather on comm_world or dinner of them are stuck on some orthogonal collective communication pattern. George On Fri, Sep 9, 2022, 21:24 Niranda Perera via users < users@lists.open-mpi.org> wrote: > Hi all, > > I have the following use case. I have N mpi ranks in the global > communicator, and I split it into two, first being rank 0, and the other > being all ranks from 1-->N-1. > Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to > broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world. > Rank0 then immediately calls a gather (blocking) over comm_world and > busywait for results. Once the broadcast is received by workers, they call > a method foo(args, local_comm). Inside foo, workers communicate with each > other using the subcommunicator, and each produce N-1 results, which would > be sent to Rank0 as gather responses over comm_world. Inside foo there are > multiple iterations, collectives, send-receives, etc. > > This seems to be working okay with smaller parallelism and smaller tasks > of foo. But when the parallelism increases (eg: 64... 512), only a single > iteration completes inside foo. Subsequent iterations, seems to be hanging. > > Is this an anti-pattern in MPI? Should I use igather, ibcast instead of > blocking calls? > > Any help is greatly appreciated. > > -- > Niranda Perera > https://niranda.dev/ > @n1r44 <https://twitter.com/N1R44> > >