Re: [OMPI users] Waitall never returns [solved]

Ross Boylan Thu, 10 Apr 2014 16:40:53 -0400 (EDT)

Waitall was not returning for the mundane reason that not all messagessent were received. I'm not sure why the dump command seemed to saythere was nothing waiting. Ironically, the bug would never appear inproduction, only in testing.


I fixed up my shutdown logic and all seems well now.


Ross
On 4/10/2014 1:06 PM, Ross Boylan wrote:

On 4/10/2014 11:48 AM, Ross Boylan wrote:
On 4/9/2014 5:26 PM, Ross Boylan wrote:
On Fri, 2014-04-04 at 22:40 -0400, George Bosilca wrote:
Ross,
I’m not familiar with the R implementation you are using, but bearwith me and I will explain how you can all Open MPI about the listof all pending requests on a process. Disclosure: This is Open MPIdeep voodoo, an extreme way to debug applications that might saveyou quite some time.
The only thing you need is the communicator you posted yourrequests into, or at least a pointer to it. Then you attach to yourprocess (or processes) with your preferred debugger and call
   mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose)
With gdb this should look like “call mca_pml_ob1_dump(my_comm, 1)”.This will dump human readable information about all the requestspending on a communicator (both sends and receives).
Thank you so much for the tip. After inserting a barrier failed tohelp
I managed to reproduce the problem with all ranks on one node.  I see
BTL SM 0x7fe9970ae660 endpoint 0x1f13470 [smp_rank 5] [peer_rank 0]
....
BTL SM 0x7fe9970ae660 endpoint 0x20eebb0 [smp_rank 5] [peer_rank 12]
which, if my previous theory of mca_mpl_ob1_dump is correct, meansthere are no outstanding requests since there are no items listedunder the BTL lines.
This again has me wondering if requests can be closed without somekind of Wait or Test command.
Sometimes the system runs to completion; the trigger seems to behaving some ranks that finish rapidly because there are more suchprocesses than work for them to do.

Re: [OMPI users] Waitall never returns [solved]

Reply via email to