I've been trying to get OpenMPI to work on Amazon's EC2 but I've been
running into a communications problem. Here is the source (typical
Hello, World):


#include <stdio.h>
#include "mpi.h"

int main(argc,argv)
int argc;
char *argv[];
{
    int myid, numprocs;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    printf ("%d of %d: Hello world!\n", myid, numprocs);

    MPI_Finalize();
    return 0;
}


After compiling it, I copied it over to the other machine and tried
running it with:

mpirun -v --mca btl self,tcp -np 4 --machinefile machines /mnt/mpihw

which produces:

--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
[domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,2] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
mpirun noticed that job rank 0 with PID 3653 on node
domU-12-31-39-00-B2-23 exited on signal 15 (Terminated).
1 additional process aborted (not shown)



AFAIK, the machines are able to communicate with each other on any port
you like, just not with MPI. Any idea what's wrong?


Reply via email to