On Oct 19, 2007, at 9:29 AM, Marcin Skoczylas wrote:

Jeff Squyres wrote:
On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote:


/I assume this could be because of:

$ /sbin/route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref
Use
Iface
192.125.17.0    *               255.255.255.0   U     0
0        0 eth1
192.168.12.0    *               255.255.255.0   U     0
0        0 eth1
161.254.0.0     *               255.255.0.0     U     0
0        0 eth1
default         192.125.17.1    0.0.0.0         UG    0
0        0 eth1


Actually the configuration here is quite strange, this is not a private
address. The head node sits on a public address from 192.125.17.0 net
(routable from outside), workers are on 192.168.12.0

I have an almost similar configuration that works just fine with OpenMPI, in my case the head node has three interfaces and the worker nodes each have two interfaces, the configuration is roughly:

master: eth0: 192.168.x.x, eth1 & eth2 bonded to 10.0.0.1
node2: eth0 & eth1 bonded to 10.0.0.2
nodeN: eth0 & eth1 bonded to 10.0.0.N

So our "outside" communication with the head node is on the 192.168 network and the internal communication is on the 10.0.0.x network.

In your case the "outside" communication is on the the 192.125 network and the internal communication is on the 192.168 network.

The primary difference seems to be that you have all communication going over a single interface.

I'm a little surprised there is any problem at all with OpenMPI & your configuration as my configuration is more complicated.

Michael

Reply via email to