Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-19 Thread George Bosilca
Whatever the original choice(s) of the BTL are, an interface should disqualify itself after few missed connections (based on the retry MCA parameter). However, in order to get anything sensible in this configuration you should change the default timeout to a reasonable value (30 seconds?). Whil

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
On Sep 18, 2015, at 7:26 PM, Gilles Gouaillardet wrote: > > I built a similar environment with master and private ip and that does not > work. > my understanding is each tasks has two tcp btl (one per interface), > and there is currently no mechanism to tell that a node is unreachable > via a g

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Gilles Gouaillardet
Jeff, I built a similar environment with master and private ip and that does not work. my understanding is each tasks has two tcp btl (one per interface), and there is currently no mechanism to tell that a node is unreachable via a given btl. (a btl picks the "best" interface for each node, but it

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Shang Li
Hi Jeff, Thanks for your suggestion. (And also thanks to Gilles!) I'll play around with your suggestions and let you know if I make any progresses. About the version of my Open MPI, it's an Texas Instruments' implementation. So the version number 1.0.0.22 is their own version.. I looked at their

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
Whoa; wait -- are you really using Open MPI v1.0? That's over 10 years old... Can you update to Open MPI v1.10? > On Sep 18, 2015, at 1:37 PM, Jeff Squyres (jsquyres) > wrote: > > Open MPI uses different heuristics depending on whether IP addresses are > public or private. > > All your IP

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
Open MPI uses different heuristics depending on whether IP addresses are public or private. All your IP addresses are technically "public" -- they're not in 10.x.x.x or 192.168.x.x, for example. So Open MPI assumes that they are all routable to each other. You might want to change your 3 netwo

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-17 Thread Gilles Gouaillardet
Shang, can you please run mpirun --version i cannot find the ompi version you are running based on the git hash you reported as a temporary workaround, you can do minimal tcp routing : on the three nodes 1) run sysctl -w net.ipv4.ip_forward=1 2) route the other nodes interface not on the same

[OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-17 Thread Shang Li
Hi all, I wanted to setup a 3-node ring network, each connects to the other 2 using 2 Ethernet ports directly without a switch/router. The interface configurations could be found in the following picture. https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0 I've used *i