Hi Joachim and George, Thank you for your response. I tried with MPI_Issend instead of MPI_Isend calls (I only have isends). For smaller parallelism it still works without any deadlocks. But the deadlocks are still there at larger parallelisms. One thing I've forgotten to mention was that, if I take off bcast/ gather from the comm_world, and just pass comm_world to my foo function, it works perfectly fine for any parallelism (tested up to 720). Let me try MUST/ TotalView and see.
I am working on a distributed memory dataframe system, Cylon [1]. As a part of an integration, we are using a Parsl [2] to schedule cylon applications. Parsl provides Cylon a subcommunicator with requested parallelism while it uses the comm_world to bcast tasks and gather outputs from each cylon worker. Cylon abstracts out communication impls and currently supports MPI, gloo and UCX. I've tested the same scenario on Gloo (gloo can spawn a comm network using an MPI Communicator), and haven't had any problems with it (apart from it being slower than MPI). Best [1] https://github.com/cylondata/cylon [2] https://github.com/Parsl/parsl On Sun, Sep 11, 2022 at 6:39 PM Protze, Joachim via users < users@lists.open-mpi.org> wrote: > Hi, > > A source of sudden deadlocks at larger scale can be a change of send > behavior from buffered to synchronous mode. You can try whether your > application deadlocks at smaller scale, if you replace all send by ssend > (e.g., add`#define MPI_Send MPI_Ssend` and `#define MPI_Isend MPI_Issend` > after the include of the MPI header). > An application with correct communication pattern should run with > synchronous send without deadlock. > To check for other deadlock pattern in your application you can use tools > like MUST [1] or Totalview. > > Best > Joachim > > > [1] https://itc.rwth-aachen.de/must/ > ------------------------------ > *From:* users <users-boun...@lists.open-mpi.org> on behalf of George > Bosilca via users <users@lists.open-mpi.org> > *Sent:* Sunday, September 11, 2022 10:40:42 PM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Cc:* George Bosilca <bosi...@icl.utk.edu> > *Subject:* Re: [OMPI users] Subcommunicator communications do not > complete intermittently > > Assuming a correct implementation the described communication pattern > should work seamlessly. > > Would it be possible to either share a reproducer or provide the execution > stack by attaching a debugger to the deadlocked application to see the > state of the different processes. I wonder if all processes join eventually > the gather on comm_world or dinner of them are stuck on some orthogonal > collective communication pattern. > > George > > > > > On Fri, Sep 9, 2022, 21:24 Niranda Perera via users < > users@lists.open-mpi.org> wrote: > > Hi all, > > I have the following use case. I have N mpi ranks in the global > communicator, and I split it into two, first being rank 0, and the other > being all ranks from 1-->N-1. > Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to > broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world. > Rank0 then immediately calls a gather (blocking) over comm_world and > busywait for results. Once the broadcast is received by workers, they call > a method foo(args, local_comm). Inside foo, workers communicate with each > other using the subcommunicator, and each produce N-1 results, which would > be sent to Rank0 as gather responses over comm_world. Inside foo there are > multiple iterations, collectives, send-receives, etc. > > This seems to be working okay with smaller parallelism and smaller tasks > of foo. But when the parallelism increases (eg: 64... 512), only a single > iteration completes inside foo. Subsequent iterations, seems to be hanging. > > Is this an anti-pattern in MPI? Should I use igather, ibcast instead of > blocking calls? > > Any help is greatly appreciated. > > -- > Niranda Perera > https://niranda.dev/ > @n1r44 <https://twitter.com/N1R44> > > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>