James,

I not 100% sure but I think I might know what's wrong. I can reproduce something similar (oddly it does not happens all the time) if I activate my firewall and let all the trafic through (ie. accept all connections). In few words, I think the firewall (even when disabled) introduce some delays in the setup stage of the TCP connection and we "kind of" lose one of the messages. Let me find a high delay cluster and I will take a look.

  Thanks,
    george.


On Fri, 10 Feb 2006, James Conway wrote:

I have copied the "MPI Tutorial: The cannonical ring program" from
<http://www.lam-mpi.org/tutorials/>. It compiles and runs fine on the
localhost (one CPU, one or more MPI processes). If I copy it to a
remotehost, it does one round of passing the 'tag' then stalls. I
modified the print statements a bit to see where in the code it
stalls, but the loop hasn't changed. This is what I see happening:
1. Process 0 successfully kicks off the pass-around by sending the
tag to the next process (1), and then enters the loop where it waits
for the tag to come back.
2. Process 1 enters the loop, receives the tag and passes it on (back
to process 0 since this is a ring of 2 players only).
3. Process 0 successfully receives the tag, decrements it, and calls
the next send (MPI_Send) but it doesn't return from this. I have a
print statement right after (with fflush) but there is no output. The
CPU is maxed out on both the local and remote hosts, I assume some
kind of polling.
4. Needless to say, Process 1 never reports receipt of the tag.

Since process 0 succeeds in calling MPI_Send before the loop, and in
calling MPI_Recv at the start of the loop, the communications appear
to be working. Likewise, process 1 succeeds in receiving and sending
within the loop. However, if its significant, these calls work one
time for each process - the second time MPI_Send is called by process
0, there is a hang.

I am using Mac OSX 10.4.4 and gcc 4.0.1 on both systems, with OpenMPI
1.0.1 installed (compiled from sources). The small tutorial code is
below (I hope its OK to include here), with the few printf mods that
I made.


"We must accept finite disappointment, but we must never lose infinite
hope."
                                  Martin Luther King

Reply via email to