Hello,
I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several interfaces are private; on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible from cluster2. chicon-3 eth0 inet addr:192.168.160.76 Bcast:192.168.160.255 Mask:255.255.255.0 eth1 inet addr:192.168.159.76 Bcast:192.168.159.255 Mask:255.255.255.0 myri0 inet addr:192.168.162.76 Bcast:192.168.162.255 Mask:255.255.255.0 on cluster2, nodes have 3 interfaces, and only 172.24.110.0/17 is visible from cluster1 netgdx-8 eth0 inet addr:172.24.190.8 Bcast:172.24.191.255 Mask:255.255.192.0 eth1 inet addr:172.24.110.8 Bcast:172.24.127.255 Mask:255.255.128.0 eth2 inet addr:172.24.240.8 Bcast:172.24.255.255 Mask:255.255.192.0 so i'm using this to declare all the other networks as private: mpirun -machinefile ~/gridnodes --mca opal_net_private_ipv4 "192.168.162.0/24\;192.168.160.0/24\;172.24.192.0/18\;172.24.128.0/18" ./alltoall but this doesn't work: [netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.160.76 failed: No route to host (113) [netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.160.76 failed: No route to host (113) [netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.160.76 failed: No route to host (113) [netgdx-8][[64214,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.160.76 failed: No route to host (113) [netgdx-8][[64214,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.160.76 failed: No route to host (113) the following patch works for me : diff -u ompi/mca/btl/tcp/btl_tcp_proc.c.orig ompi/mca/btl/tcp/btl_tcp_proc.c --- ompi/mca/btl/tcp/btl_tcp_proc.c.orig 2010-03-23 14:01:28.000000000 +0100 +++ ompi/mca/btl/tcp/btl_tcp_proc.c 2010-03-23 14:01:50.000000000 +0100 @@ -496,7 +496,7 @@ local_interfaces[i]->ipv4_netmask)) { weights[i][j] = CQ_PRIVATE_SAME_NETWORK; } else { - weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK; + weights[i][j] = CQ_NO_CONNECTION; } best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr; } Why openmpi tries to connect different private networks, given that "public" networks exists ? is it a bug or am i missing something ? -- Nicolas NICLAUSSE Service DREAM INRIA Sophia Antipolis http://www-sop.inria.fr/ 2004 route des lucioles - BP 93 Tel: (33/0) 4 92 38 76 93 06902 SOPHIA-ANTIPOLIS cedex (France) Fax: (33/0) 4 92 38 76 02
smime.p7s
Description: S/MIME Cryptographic Signature