It sounds like perhaps IOF messages aren't getting relayed along the
daemons. Note that the daemon on each node does have to be able to
send TCP messages to all other nodes, not just mpirun.
Couple of things you can do to check:
1. -mca routed direct - this will send all messages direct instead of
across the daemons
2. --leave-session-attached - will allow you to see any errors
reported by the daemons, including those from attempting to relay
messages
Ralph
On Jul 29, 2009, at 1:19 PM, David Doria wrote:
I wrote a simple program to display "hello world" from each process.
When I run this (126 - my machine, 122, and 123), everything works
fine:
[doriad@daviddoria MPITest]$ mpirun -H
10.1.2.126,10.1.2.122,10.1.2.123 hello-mpi
>From process 1 out of 3, Hello World!
From process 2 out of 3, Hello World!
From process 3 out of 3, Hello World!
When I run this (126 - my machine, 122, and 125), everything works
fine:
[doriad@daviddoria MPITest]$ mpirun -H
10.1.2.126,10.1.2.122,10.1.2.125 hello-mpi
>From process 2 out of 3, Hello World!
From process 1 out of 3, Hello World!
From process 3 out of 3, Hello World!
When I run this (126 - my machine, 123, and 125), everything works
fine:
[doriad@daviddoria MPITest]$ mpirun -H
10.1.2.126,10.1.2.123,10.1.2.125 hello-mpi
>From process 2 out of 3, Hello World!
From process 1 out of 3, Hello World!
From process 3 out of 3, Hello World!
However, when I run this (126 - my machine, 122, 123, AND 125), I
get no output at all.
Is there any way to check what is going on / does anyone know what
that would happen? I'm using OpenMPI 1.3.3
Thanks,
David
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users