Re: [OMPI users] mpirun hangs

Emanuel Ziegler Fri, 24 Feb 2006 08:24:19 -0500

> So, the question from the mpirun_debug.out-file is, what IP-addresses do 
> node01 and node02 have, is the local 10.0.0.1 node01, while 10.1.0.1 is 
> node02?
> Maybe the route on node01 is not correct to node02?


Ok, I figured out the problem, but didn't solve it completely.

node01 and node02 both have multiple IP addresses.
node01 has 10.0.0.1 for TCP (eth1) and 10.1.0.1 for IPoIB (ib0).
node02 has 10.0.0.2 for TCP (eth1) and 10.1.0.2 for IPoIB (ib0).
The latter addresses are useless, but don't affect the problem. I chose
eth1 on both machines b/c eth0 is only 10/100 MBit and I wanted to have
GBit connections to the file server in the internal network. The problem
was, that I set up eth0 on node01 (golden client) using DHCP on the
external network for setup purposes. Hence, it also had an external
address (129.206.102.93) which was unaccessible from node02.

Since orterun was started with the parameters
    --nsreplica "0.0.0;tcp://129.206.102.93:54866;tcp://10.0.0.1:54866"
    --gprreplica "0.0.0;tcp://129.206.102.93:54866;tcp://10.0.0.1:54866"
node02 first tried to communicate with 129.206.102.93 which was
impossible and hanged although it would have been able to access
10.0.0.1 without any problems. But obviously it never got to this point.

Although disabling eth0 with "ifdown eth0" solves the problem, this is
not applicable to my cluster since this was just a test setup und I need
the external address for my head node.

Can I configure orterun/orted to use only eth1?

Thanks for Your help,
  Emanuel

Re: [OMPI users] mpirun hangs

Reply via email to