I've been trying to get OpenMPI to work on Amazon's EC2 but I've been running into a communications problem. Here is the source (typical Hello, World):
#include <stdio.h> #include "mpi.h" int main(argc,argv) int argc; char *argv[]; { int myid, numprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); printf ("%d of %d: Hello world!\n", myid, numprocs); MPI_Finalize(); return 0; }
After compiling it, I copied it over to the other machine and tried running it with: mpirun -v --mca btl self,tcp -np 4 --machinefile machines /mnt/mpihw which produces: -------------------------------------------------------------------------- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -------------------------------------------------------------------------- Process 0.1.3 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) [domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,2] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) mpirun noticed that job rank 0 with PID 3653 on node domU-12-31-39-00-B2-23 exited on signal 15 (Terminated). 1 additional process aborted (not shown) AFAIK, the machines are able to communicate with each other on any port you like, just not with MPI. Any idea what's wrong?