Re: [OMPI users] Waitall never returns

Ross Boylan Thu, 10 Apr 2014 14:48:53 -0400 (EDT)

On 4/9/2014 5:26 PM, Ross Boylan wrote:

On Fri, 2014-04-04 at 22:40 -0400, George Bosilca wrote:

Ross,


I’m not familiar with the R implementation you are using, but bear with me and 
I will explain how you can all Open MPI about the list of all pending requests 
on a process. Disclosure: This is Open MPI deep voodoo, an extreme way to debug 
applications that might save you quite some time.

The only thing you need is the communicator you posted your requests into, or 
at least a pointer to it. Then you attach to your process (or processes) with 
your preferred debugger and call
   mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose)

With gdb this should look like “call mca_pml_ob1_dump(my_comm, 1)”. This will 
dump human readable information about all the requests pending on a 
communicator (both sends and receives).

Thank you so much for the tip.  After inserting a barrier failed to help
I decided to try this.  After much messing around (details below):
BTL SM 0x7f615dea9660 endpoint 0x3c15d90 [smp_rank 5] [peer_rank 0]
BTL SM 0x7f615dea9660 endpoint 0x3b729e0 [smp_rank 5] [peer_rank 1]
BTL SM 0x7f615dea9660 endpoint 0x3b72ad0 [smp_rank 5] [peer_rank 2]
BTL SM 0x7f615dea9660 endpoint 0x3c06e60 [smp_rank 5] [peer_rank 3]
BTL SM 0x7f615dea9660 endpoint 0x3c06f50 [smp_rank 5] [peer_rank 4]
[n2:10664] [Rank 0]
[n2:10664] [Rank 1]
[n2:10664] [Rank 2]
[n2:10664] [Rank 3]
[n2:10664] [Rank 4]
[n2:10664] [Rank 5]
[n2:10664] [Rank 6]
[n2:10664] [Rank 7]
[n2:10664] [Rank 8]
[n2:10664] [Rank 9]
[n2:10664] [Rank 10]
[n2:10664] [Rank 11]
[n2:10664] [Rank 12]
[n2:10664] [Rank 13]

After tracing through the code, things seem odder, though different.
First, the output above is out of sequence.

Second, I think the BTLs are transport mechanisms, or something similar,not actual messages.

If there were messages, they would be listed underneath.  There aren't any.

So I think this shows there is nothing to wait on, as I suspected.Except I seem to be missing info for the remote ranks.

Is there any way a request can be completed absent a Wait or Test on therequest?

Third, I'm seeing BTL's listed for one rank with which I do communicate(0), and 4 ranks I do not communicate with. Ranks 0-5 are local and therest are remote. rank 5 does communicate with all the remote nodes, butabsolutely nothing is listed for them. When I trace frombml_btl->btl->btl_dump(bml_btl->btl, bml_btl->btl_endpoint, verbose)in mca_pml_ob1_dump I get to

I see (gdb in emacs)

void mca_btl_base_dump(
    struct mca_btl_base_module_t* btl,
    struct mca_btl_base_endpoint_t* endpoint,
    int verbose)
{
}

The function is a no-op. Which sort of explains why I'm seeing nothingfor those ranks, but doesn't seem quite right.

The pending messages are likely to be to the remote ranks.

Ross

In sequence output:
[n2:11695] [Rank 0]
BTL SM 0x7fa37e1b4660 endpoint 0x31a7d70 [smp_rank 5] [peer_rank 0]
[n2:11695] [Rank 1]
BTL SM 0x7fa37e1b4660 endpoint 0x31049e0 [smp_rank 5] [peer_rank 1]
[n2:11695] [Rank 2]
BTL SM 0x7fa37e1b4660 endpoint 0x3104ad0 [smp_rank 5] [peer_rank 2]
[n2:11695] [Rank 3]
BTL SM 0x7fa37e1b4660 endpoint 0x3198e60 [smp_rank 5] [peer_rank 3]
[n2:11695] [Rank 4]
BTL SM 0x7fa37e1b4660 endpoint 0x3198f50 [smp_rank 5] [peer_rank 4]
[n2:11695] [Rank 5]
[n2:11695] [Rank 6]
[n2:11695] [Rank 7]
[n2:11695] [Rank 8]
[n2:11695] [Rank 9]
[n2:11695] [Rank 10]
[n2:11695] [Rank 11]
[n2:11695] [Rank 12]
[n2:11695] [Rank 13]


Not entirely human readable if the human is me!
Does smp_rank (and peer_rank) = what I would get from MPI_Comm_rank?  I
hope so, because I was aiming for rank 5.
How do I know if I'm sending or receiving?  They should all be sends.

What are all the lines like
[n2:10664] [Rank 7]?

What this seems to show is very odd.
First, my code thinks there are 3 outstanding Isends.  Does this report
include requests that have become inactive (because complete)?

Second, during normal operations rank 5 does not talk to ranks 1-4.
I did put an MPI_Barrier in just before shutdown, but the trace
information indicates rank 5 never gets to that step.

To provide fuller context, and maybe some clues to others who attempt
this, I first tried this with my non-debug enabled libraries.  I guessed
that the ranks were in the same order as the process numbers and invoked
gdb on my R executable giving the process number (once the system
reached its stuck state).

Accessing the communicator was tricky, via the comm variable defined in
the Rmpi library.  So overall, the executable for R starts and loads the
Rmpi library.  The latter in turn loads and references the MPI library.
The communicators are defined in the Rmpi library with MPI_Comm *comm,
and then one I need is comm[1].

When I tried to reference it I got an error that there was no debugging
info.  I reconfigured MPI with --enable-debug and rebuilt it  (make
clean all install).  Then I launched everything again; I did not rebuild
Rmpi against the debug libraries, though I installed the debug libraries
in the old location for the regular ones.

I still had problems:
(gdb) p comm[1]
cannot subscript something of type `<data variable, no debug info>'
The error message I got before making MPI with debug was a bit different
and stronger,

I realized that comm was a symbol in Rmpi which I had not built with
debug symbols.  Since MPI_Comm should now be understood by the debugger
I tried and explicit cast, which worked:
call mca_pml_ob1_dump(((MPI_Comm *) comm)[1], 1)

So I'm not entirely sure if the build of a debug version of MPI was
necessary.

Ross

If you are right, all processes will report NONE, and the bug is somewhere 
in-between your application and the MPI library. Otherwise, you might have some 
not-yet-completed requests pending…

   George.


On Apr 4, 2014, at 22:20 , Ross Boylan <r...@biostat.ucsf.edu> wrote:

On 4/4/2014 6:01 PM, Ralph Castain wrote:

It sounds like you don't have a balance between sends and recvs somewhere - 
i.e., some apps send messages, but the intended recipient isn't issuing a recv 
and waiting until the message has been received before exiting. If the 
recipient leaves before the isend completes, then the isend will never complete 
and the waitall will not return.

I'm pretty sure the sends complete because I wait on something that can only be 
computed after the sends complete, and I know I have that result.

My current theory is that my modifications to Rmpi are not properly tracking 
all completed messages, resulting in it thinking there are outstanding messages 
(and passing a positive count to the C-level MPI_Waitall with associated 
garbagey arrays).  But I haven't isolated the problem.

Ross


On Apr 4, 2014, at 5:20 PM, Ross Boylan <r...@biostat.ucsf.edu> wrote:

During shutdown of my application the processes issue a waitall, since they 
have done some Isends.  A couple of them never return from that call.

Could this be the result of some of the processes already being shutdown (the 
processes with the problem were late in the shutdown sequence)?  If so, what is 
the recommended solution?  A barrier?

The shutdown proceeds in stages, but the processes in question are not told to 
shutdown until all the messages they have sent have been received.  So there 
shouldn't be any outstanding messages from them.

My reading of the manual is that Waitall with a count of 0 should return 
immediately, not hang. Is that correct?

Running under R with openmpi 1.7.4.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Waitall never returns

Reply via email to