Hello,

I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
interfaces are private;

on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
from cluster2.

chicon-3
eth0     inet addr:192.168.160.76  Bcast:192.168.160.255  Mask:255.255.255.0
eth1     inet addr:192.168.159.76  Bcast:192.168.159.255  Mask:255.255.255.0
myri0    inet addr:192.168.162.76  Bcast:192.168.162.255  Mask:255.255.255.0

on cluster2, nodes have 3 interfaces, and only 172.24.110.0/17 is visible
from cluster1

netgdx-8
eth0  inet addr:172.24.190.8  Bcast:172.24.191.255  Mask:255.255.192.0
eth1  inet addr:172.24.110.8  Bcast:172.24.127.255  Mask:255.255.128.0
eth2  inet addr:172.24.240.8  Bcast:172.24.255.255  Mask:255.255.192.0

so i'm using this to declare all the other networks as private:

mpirun -machinefile ~/gridnodes  --mca opal_net_private_ipv4
"192.168.162.0/24\;192.168.160.0/24\;172.24.192.0/18\;172.24.128.0/18"
./alltoall

but this doesn't work:

[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)
[netgdx-8][[64214,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.160.76 failed: No route to host (113)

the following patch works for me :

diff -u ompi/mca/btl/tcp/btl_tcp_proc.c.orig ompi/mca/btl/tcp/btl_tcp_proc.c
--- ompi/mca/btl/tcp/btl_tcp_proc.c.orig        2010-03-23
14:01:28.000000000 +0100
+++ ompi/mca/btl/tcp/btl_tcp_proc.c     2010-03-23 14:01:50.000000000 +0100
@@ -496,7 +496,7 @@
                                 local_interfaces[i]->ipv4_netmask)) {
                         weights[i][j] = CQ_PRIVATE_SAME_NETWORK;
                     } else {
-                        weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK;
+                        weights[i][j] = CQ_NO_CONNECTION;
                     }
                     best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr;
                 }


Why openmpi tries to connect different private networks, given that
"public" networks exists ? is it a bug or am i missing something ?

-- 
Nicolas NICLAUSSE                          Service DREAM
INRIA Sophia Antipolis                     http://www-sop.inria.fr/
2004 route des lucioles - BP 93            Tel: (33/0) 4 92 38 76 93
06902  SOPHIA-ANTIPOLIS cedex (France)     Fax: (33/0) 4 92 38 76 02

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to