On Oct 19, 2007, at 9:29 AM, Marcin Skoczylas wrote:
Jeff Squyres wrote:
On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote:
/I assume this could be because of:
$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref
Use
Iface
192.125.17.0 * 255.255.255.0 U 0
0 0 eth1
192.168.12.0 * 255.255.255.0 U 0
0 0 eth1
161.254.0.0 * 255.255.0.0 U 0
0 0 eth1
default 192.125.17.1 0.0.0.0 UG 0
0 0 eth1
Actually the configuration here is quite strange, this is not a
private
address. The head node sits on a public address from 192.125.17.0 net
(routable from outside), workers are on 192.168.12.0
I have an almost similar configuration that works just fine with
OpenMPI, in my case the head node has three interfaces and the worker
nodes each have two interfaces, the configuration is roughly:
master: eth0: 192.168.x.x, eth1 & eth2 bonded to 10.0.0.1
node2: eth0 & eth1 bonded to 10.0.0.2
nodeN: eth0 & eth1 bonded to 10.0.0.N
So our "outside" communication with the head node is on the 192.168
network and the internal communication is on the 10.0.0.x network.
In your case the "outside" communication is on the the 192.125
network and the internal communication is on the 192.168 network.
The primary difference seems to be that you have all communication
going over a single interface.
I'm a little surprised there is any problem at all with OpenMPI &
your configuration as my configuration is more complicated.
Michael