Hi Jeff, Thanks for your suggestion. (And also thanks to Gilles!) I'll play around with your suggestions and let you know if I make any progresses.
About the version of my Open MPI, it's an Texas Instruments' implementation. So the version number 1.0.0.22 is their own version.. I looked at their documentation and it says it's based on Open MPI 1.7.1. So I guess it's not that old lol. Thanks again, Shang On Fri, Sep 18, 2015 at 1:38 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > Whoa; wait -- are you really using Open MPI v1.0? > > That's over 10 years old... > > Can you update to Open MPI v1.10? > > > > On Sep 18, 2015, at 1:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > > Open MPI uses different heuristics depending on whether IP addresses are > public or private. > > > > All your IP addresses are technically "public" -- they're not in > 10.x.x.x or 192.168.x.x, for example. > > > > So Open MPI assumes that they are all routable to each other. > > > > You might want to change your 3 networks to be 10.1.x.x/16, 10.2.x.x/16, > and 10.3.x.x/16. See if that makes it work. > > > > > >> On Sep 17, 2015, at 12:31 PM, Shang Li <shawn.li.x...@gmail.com> wrote: > >> > >> Hi all, > >> > >> I wanted to setup a 3-node ring network, each connects to the other 2 > using 2 Ethernet ports directly without a switch/router. > >> > >> The interface configurations could be found in the following picture. > >> > >> > https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0 > >> > >> I've used ifconfig on each node to configure each port, and made sure I > can ssh from each node to the other 2 nodes. > >> > >> But a simple ring_c example doesn't work... So I turn on --mca > btl_base_verbose 30, I could see that node1 was trying to use 23.0.0.2 > (linke between node2 and 3) to get to node2 though there is a direct link > to node 2. > >> > >> The output log is like: > >> > >> [node1:01828] btl: tcp: attempting to connect() to [[19529,1],1] > address 23.0.0.2 on port 1024 > >> > [[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect] > from node1 to: node2 Unable to connect to the peer 23.0.0.2 on port 4: > Network is unreachable > >> > >> I've read the following posts and FAQs but still couldn't understand > this kind of behavior. > >> > >> http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3 > >> http://www.open-mpi.org/faq/?category=tcp#tcp-selection > >> http://www.open-mpi.org/community/lists/users/2014/11/25810.php > >> > >> > >> Any pointers would be appreciated! Thanks in advance! > >> > >> My open-mpi info: > >> > >> Package: Open MPI gtbldadm@ubuntu-12 Distribution > >> Open MPI: 1.0.0.22 > >> Open MPI repo revision: git714842d > >> Open MPI release date: May 27, 2015 > >> Open RTE: 1.0.0.22 > >> Open RTE repo revision: git714842d > >> Open RTE release date: May 27, 2015 > >> OPAL: 1.0.0.22 > >> OPAL repo revision: git714842d > >> OPAL release date: May 27, 2015 > >> MPI API: 2.1 > >> > >> > >> Best, > >> Shawn > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27612.php > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27627.php >