Wow. I thought in the first place that all combinations would be equivalent, but in fact, this is not the case... I've kept the firewalls down during all the tests.
> - on node1, "mpirun --host node1,node2 ring_c" Works. > - on node1, "mpirun --host node1,node3 ring_c" > - on node1, "mpirun --host node2,node3 ring_c" Blocks after "Process 0 sent to 1". > - on node1, "mpirun --host node1,node2,node3 ring_c" "Process 0 sending 10 to 1, tag 201 (3 processes in ring) Process 0 sent to 1 Process 0 decremented value: 9" then blocks > Repeat all 4 from node2. On node 2 : - "mpirun --host node2,node1 ring_c" : OK - "mpirun --host node2,node3 ring_c" : blocks at same point that above. - "mpirun --host node1,node3 ring_c" : blocks at same point that above. - "mpirun --host node1,node2,node3 ring_c" : blocks at same point that mentioned above in case of 3 hosts. I recompiled this test program with MPICH2 and have the exactly same issues at the same time. There is really something wrong with that network... -- Benjamin Bouvier