Brian et al,

Original thread was "[O-MPI users] Firewall ports and Mac OS X 10.4.4"

On Feb 9, 2006, at 11:26 PM, Brian Barrett wrote:

Open MPI uses random port numbers for all it's communication.
(etc)

Thanks for the explanation. I will live with the open Firewall, and look at the ipfw docs for writing a script.

Now I have a more "core" OpenMPI problem, which may be just unfamiliarity on my part. I seem to have the environment variables set up alright though - the code runs, but doesn't complete.

I have copied the "MPI Tutorial: The cannonical ring program" from <http://www.lam-mpi.org/tutorials/>. It compiles and runs fine on the localhost (one CPU, one or more MPI processes). If I copy it to a remotehost, it does one round of passing the 'tag' then stalls. I modified the print statements a bit to see where in the code it stalls, but the loop hasn't changed. This is what I see happening: 1. Process 0 successfully kicks off the pass-around by sending the tag to the next process (1), and then enters the loop where it waits for the tag to come back. 2. Process 1 enters the loop, receives the tag and passes it on (back to process 0 since this is a ring of 2 players only). 3. Process 0 successfully receives the tag, decrements it, and calls the next send (MPI_Send) but it doesn't return from this. I have a print statement right after (with fflush) but there is no output. The CPU is maxed out on both the local and remote hosts, I assume some kind of polling.
4. Needless to say, Process 1 never reports receipt of the tag.

Output (with a little re-ordering to make sense) is:
   mpirun --hostfile my_mpi_hosts --np 2 mpi_test1
   Process rank 0: size = 2
   Process rank 1: size = 2
   Enter the number of times around the ring: 5

   Process 0 doing first send of '4' to 1
   Process 0 finished sending, now entering loop

   Process 0 waiting to receive from 1

   Process 1 waiting to receive from 0
   Process 1 received '4' from 0
   Process 1 sending '4' to 0
   Process 1 finished sending
   Process 1 waiting to receive from 0

   Process 0 received '4' from 1
   >>Process 0 decremented num
   Process 0 sending '3' to 1
   !---- nothing more - hangs at 100% cpu until ctrl-
   !---- should see "Process 0 finished sending"

Since process 0 succeeds in calling MPI_Send before the loop, and in calling MPI_Recv at the start of the loop, the communications appear to be working. Likewise, process 1 succeeds in receiving and sending within the loop. However, if its significant, these calls work one time for each process - the second time MPI_Send is called by process 0, there is a hang.

I am using Mac OSX 10.4.4 and gcc 4.0.1 on both systems, with OpenMPI 1.0.1 installed (compiled from sources). The small tutorial code is below (I hope its OK to include here), with the few printf mods that I made.

Any pointers appreciated!

James Conway

----------------------------------------------------------------------
James Conway, PhD.,
Department of Structural Biology
University of Pittsburgh School of Medicine
Biomedical Science Tower 3, Room 2047
3501 5th Ave
Pittsburgh, PA 15260
U.S.A.
Phone: +1-412-383-9847
Fax:   +1-412-648-8998
Email: jxc...@pitt.edu
Web:   <http://www.pitt.edu/~jxc100/> (under construction)
----------------------------------------------------------------------


/*
 * Open Systems Lab
 * http://www.lam-mpi.org/tutorials/
 * Indiana University
 *
 * MPI Tutorial
 * The cannonical ring program
 *
 * Mail questions regarding tutorial material to m...@lam-mpi.org
 */

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[]);


int main(int argc, char *argv[])
{
  MPI_Status status;
  int num, rank, size;

  /* Start up MPI */

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

/*
Arbitrarily choose 201 to be our tag.  Calculate the
rank of the next process in the ring.  Use the modulus
operator so that the last process "wraps around" to rank
zero.
*/

  const int tag  = 201;
  const int next = (rank + 1) % size;
  const int from = (rank + size - 1) % size;

  printf("Process rank %d: size = %d\n", rank, size);

/*
If we are the "console" process, get an integer from the user
to specify how many times we want to go around the ring
*/

  if (rank == 0) {
    printf("Enter the number of times around the ring: ");
    scanf("%d", &num);
    --num;

printf("Process %d doing first send of '%d' to %d\n", rank, num, next);
    MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
    printf("Process %d finished sending, now entering loop\n", rank);
    fflush(stdout);
  }

/*
Pass the message around the ring.  The exit mechanism works
as follows: the message (a positive integer) is passed
around the ring.  Each time is passes rank 0, it is decremented.
When each processes receives the 0 message, it passes it on
to the next process and then quits.  By passing the 0 first,
every process gets the 0 message and can quit normally.
*/

  while (1) {

    printf("Process %d waiting to receive from %d\n", rank, from);
    MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
    printf("Process %d received '%d' from %d\n", rank, num, from);
    fflush(stdout);

    if (rank == 0) {
      num--;
      printf(">>Process 0 decremented num\n");
      fflush(stdout);
    }

    printf("Process %d sending '%d' to %d\n", rank, num, next);
    MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
    printf("Process %d finished sending\n", rank);
    fflush(stdout);

    if (num == 0) {
      printf("Process %d exiting\n", rank);
      fflush(stdout);
      break;
    }
  }

// The last process does one extra send to process 0, which needs
// to be received before the program can exit

  printf("Process %d after loop\n", rank);
  fflush(stdout);

  if (rank == 0)
    MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);

// Quit

  MPI_Finalize();
  return 0;
}



Reply via email to