Hi all,

I have the following use case. I have N mpi ranks in the global
communicator, and I split it into two, first being rank 0, and the other
being all ranks from 1-->N-1.
Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to
broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world.
Rank0 then immediately calls a gather (blocking) over comm_world and
busywait for results. Once the broadcast is received by workers, they call
a method foo(args, local_comm). Inside foo, workers communicate with each
other using the subcommunicator, and each produce N-1 results, which would
be sent to Rank0 as gather responses over comm_world. Inside foo there are
multiple iterations, collectives, send-receives, etc.

This seems to be working okay with smaller parallelism and smaller tasks of
foo. But when the parallelism increases (eg: 64... 512), only a single
iteration completes inside foo. Subsequent iterations, seems to be hanging.

Is this an anti-pattern in MPI? Should I use igather, ibcast instead of
blocking calls?

Any help is greatly appreciated.

-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to