Re: [OMPI users] Cluster : received unexpected process identifier

Jeffrey Squyres Wed, 4 Apr 2012 20:11:38 -0400

On Apr 4, 2012, at 8:04 PM, Rohan Deshpande wrote:

> Yes they are on same subnet. ips for example - 192.168.1.1,  192.168.1.2,  
> 192.168.1.3


This is generally considered a bad idea, not just for MPI, but for Linux in 
general.  Google around about this.  One reason, for example, is that there is 
no way to guarantee which IP interface traffic will actually be sent out.  For 
example, if you open a socket to a peer IP address (e.g., 192.168.1.10), which 
IP address will be used to create that socket -- .1, .2, or .3?  There's no way 
to know.

(this is actually exactly the scenario that OMPI was complaining about; it got 
a socket from an unexpected IP address, and therefore got confused and 
basically said, "hey human, go figure this out")

You need to put your IP interfaces on different IP subnets.  E.g., have eth0 on 
192.168.1.x/24, eth1 on 192.168.2.x/24, and eth2 on 192.168.3.x/24.  It depends 
on how your networks are configured and what hardware you have -- you can 
implement this with switch-based VLANs (e.g., the ports that the 1.x wires go 
into are hard-wired to VLAN 10, the ports that the 2.x wires go into are 
hard-wired to VLAN 20, etc.), or using multiple switches (e.g., each 1.x wire 
goes to switch A, each 2.x wire goes to switch B, etc.).

Make sense?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Cluster : received unexpected process identifier

Reply via email to