Hi all, I have the following use case. I have N mpi ranks in the global communicator, and I split it into two, first being rank 0, and the other being all ranks from 1-->N-1. Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world. Rank0 then immediately calls a gather (blocking) over comm_world and busywait for results. Once the broadcast is received by workers, they call a method foo(args, local_comm). Inside foo, workers communicate with each other using the subcommunicator, and each produce N-1 results, which would be sent to Rank0 as gather responses over comm_world. Inside foo there are multiple iterations, collectives, send-receives, etc.
This seems to be working okay with smaller parallelism and smaller tasks of foo. But when the parallelism increases (eg: 64... 512), only a single iteration completes inside foo. Subsequent iterations, seems to be hanging. Is this an anti-pattern in MPI? Should I use igather, ibcast instead of blocking calls? Any help is greatly appreciated. -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>