Wow. I thought in the first place that all combinations would be equivalent, 
but in fact, this is not the case...
I've kept the firewalls down during all the tests.

> - on node1, "mpirun --host node1,node2 ring_c"
Works.

> - on node1, "mpirun --host node1,node3 ring_c"
> - on node1, "mpirun --host node2,node3 ring_c"
Blocks after "Process 0 sent to 1".

> - on node1, "mpirun --host node1,node2,node3 ring_c"
"Process 0 sending 10 to 1, tag 201 (3 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9" then blocks

> Repeat all 4 from node2.
On node 2 : 
- "mpirun --host node2,node1 ring_c" : OK
- "mpirun --host node2,node3 ring_c" : blocks at same point that above.
- "mpirun --host node1,node3 ring_c" : blocks at same point that above.
- "mpirun --host node1,node2,node3 ring_c" : blocks at same point that 
mentioned above in case of 3 hosts.

I recompiled this test program with MPICH2 and have the exactly same issues at 
the same time. 
There is really something wrong with that network...
--
Benjamin Bouvier

Reply via email to