On 4/9/2014 5:26 PM, Ross Boylan wrote:
On Fri, 2014-04-04 at 22:40 -0400, George Bosilca wrote:
Ross,
I’m not familiar with the R implementation you are using, but bear with me and
I will explain how you can all Open MPI about the list of all pending requests
on a process. Disclosure: This is Open MPI deep voodoo, an extreme way to debug
applications that might save you quite some time.
The only thing you need is the communicator you posted your requests into, or
at least a pointer to it. Then you attach to your process (or processes) with
your preferred debugger and call
mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose)
With gdb this should look like “call mca_pml_ob1_dump(my_comm, 1)”. This will
dump human readable information about all the requests pending on a
communicator (both sends and receives).
Thank you so much for the tip. After inserting a barrier failed to help
I decided to try this. After much messing around (details below):
BTL SM 0x7f615dea9660 endpoint 0x3c15d90 [smp_rank 5] [peer_rank 0]
BTL SM 0x7f615dea9660 endpoint 0x3b729e0 [smp_rank 5] [peer_rank 1]
BTL SM 0x7f615dea9660 endpoint 0x3b72ad0 [smp_rank 5] [peer_rank 2]
BTL SM 0x7f615dea9660 endpoint 0x3c06e60 [smp_rank 5] [peer_rank 3]
BTL SM 0x7f615dea9660 endpoint 0x3c06f50 [smp_rank 5] [peer_rank 4]
[n2:10664] [Rank 0]
[n2:10664] [Rank 1]
[n2:10664] [Rank 2]
[n2:10664] [Rank 3]
[n2:10664] [Rank 4]
[n2:10664] [Rank 5]
[n2:10664] [Rank 6]
[n2:10664] [Rank 7]
[n2:10664] [Rank 8]
[n2:10664] [Rank 9]
[n2:10664] [Rank 10]
[n2:10664] [Rank 11]
[n2:10664] [Rank 12]
[n2:10664] [Rank 13]
After tracing through the code, things seem odder, though different.
First, the output above is out of sequence.
Second, I think the BTLs are transport mechanisms, or something similar,
not actual messages.
If there were messages, they would be listed underneath. There aren't any.
So I think this shows there is nothing to wait on, as I suspected.
Except I seem to be missing info for the remote ranks.
Is there any way a request can be completed absent a Wait or Test on the
request?
Third, I'm seeing BTL's listed for one rank with which I do communicate
(0), and 4 ranks I do not communicate with. Ranks 0-5 are local and the
rest are remote. rank 5 does communicate with all the remote nodes, but
absolutely nothing is listed for them. When I trace from
bml_btl->btl->btl_dump(bml_btl->btl, bml_btl->btl_endpoint, verbose)
in mca_pml_ob1_dump I get to
I see (gdb in emacs)
void mca_btl_base_dump(
struct mca_btl_base_module_t* btl,
struct mca_btl_base_endpoint_t* endpoint,
int verbose)
{
}
The function is a no-op. Which sort of explains why I'm seeing nothing
for those ranks, but doesn't seem quite right.
The pending messages are likely to be to the remote ranks.
Ross
In sequence output:
[n2:11695] [Rank 0]
BTL SM 0x7fa37e1b4660 endpoint 0x31a7d70 [smp_rank 5] [peer_rank 0]
[n2:11695] [Rank 1]
BTL SM 0x7fa37e1b4660 endpoint 0x31049e0 [smp_rank 5] [peer_rank 1]
[n2:11695] [Rank 2]
BTL SM 0x7fa37e1b4660 endpoint 0x3104ad0 [smp_rank 5] [peer_rank 2]
[n2:11695] [Rank 3]
BTL SM 0x7fa37e1b4660 endpoint 0x3198e60 [smp_rank 5] [peer_rank 3]
[n2:11695] [Rank 4]
BTL SM 0x7fa37e1b4660 endpoint 0x3198f50 [smp_rank 5] [peer_rank 4]
[n2:11695] [Rank 5]
[n2:11695] [Rank 6]
[n2:11695] [Rank 7]
[n2:11695] [Rank 8]
[n2:11695] [Rank 9]
[n2:11695] [Rank 10]
[n2:11695] [Rank 11]
[n2:11695] [Rank 12]
[n2:11695] [Rank 13]
Not entirely human readable if the human is me!
Does smp_rank (and peer_rank) = what I would get from MPI_Comm_rank? I
hope so, because I was aiming for rank 5.
How do I know if I'm sending or receiving? They should all be sends.
What are all the lines like
[n2:10664] [Rank 7]?
What this seems to show is very odd.
First, my code thinks there are 3 outstanding Isends. Does this report
include requests that have become inactive (because complete)?
Second, during normal operations rank 5 does not talk to ranks 1-4.
I did put an MPI_Barrier in just before shutdown, but the trace
information indicates rank 5 never gets to that step.
To provide fuller context, and maybe some clues to others who attempt
this, I first tried this with my non-debug enabled libraries. I guessed
that the ranks were in the same order as the process numbers and invoked
gdb on my R executable giving the process number (once the system
reached its stuck state).
Accessing the communicator was tricky, via the comm variable defined in
the Rmpi library. So overall, the executable for R starts and loads the
Rmpi library. The latter in turn loads and references the MPI library.
The communicators are defined in the Rmpi library with MPI_Comm *comm,
and then one I need is comm[1].
When I tried to reference it I got an error that there was no debugging
info. I reconfigured MPI with --enable-debug and rebuilt it (make
clean all install). Then I launched everything again; I did not rebuild
Rmpi against the debug libraries, though I installed the debug libraries
in the old location for the regular ones.
I still had problems:
(gdb) p comm[1]
cannot subscript something of type `<data variable, no debug info>'
The error message I got before making MPI with debug was a bit different
and stronger,
I realized that comm was a symbol in Rmpi which I had not built with
debug symbols. Since MPI_Comm should now be understood by the debugger
I tried and explicit cast, which worked:
call mca_pml_ob1_dump(((MPI_Comm *) comm)[1], 1)
So I'm not entirely sure if the build of a debug version of MPI was
necessary.
Ross
If you are right, all processes will report NONE, and the bug is somewhere
in-between your application and the MPI library. Otherwise, you might have some
not-yet-completed requests pending…
George.
On Apr 4, 2014, at 22:20 , Ross Boylan <r...@biostat.ucsf.edu> wrote:
On 4/4/2014 6:01 PM, Ralph Castain wrote:
It sounds like you don't have a balance between sends and recvs somewhere -
i.e., some apps send messages, but the intended recipient isn't issuing a recv
and waiting until the message has been received before exiting. If the
recipient leaves before the isend completes, then the isend will never complete
and the waitall will not return.
I'm pretty sure the sends complete because I wait on something that can only be
computed after the sends complete, and I know I have that result.
My current theory is that my modifications to Rmpi are not properly tracking
all completed messages, resulting in it thinking there are outstanding messages
(and passing a positive count to the C-level MPI_Waitall with associated
garbagey arrays). But I haven't isolated the problem.
Ross
On Apr 4, 2014, at 5:20 PM, Ross Boylan <r...@biostat.ucsf.edu> wrote:
During shutdown of my application the processes issue a waitall, since they
have done some Isends. A couple of them never return from that call.
Could this be the result of some of the processes already being shutdown (the
processes with the problem were late in the shutdown sequence)? If so, what is
the recommended solution? A barrier?
The shutdown proceeds in stages, but the processes in question are not told to
shutdown until all the messages they have sent have been received. So there
shouldn't be any outstanding messages from them.
My reading of the manual is that Waitall with a count of 0 should return
immediately, not hang. Is that correct?
Running under R with openmpi 1.7.4.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users