Hi all, I wanted to setup a 3-node ring network, each connects to the other 2 using 2 Ethernet ports directly without a switch/router.
The interface configurations could be found in the following picture. https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0 I've used *ifconfig *on each node to configure each port, and made sure I can ssh from each node to the other 2 nodes. But a simple ring_c <https://github.com/open-mpi/ompi/blob/master/examples/ring_c.c> example doesn't work... So I turn on --mca btl_base_verbose 30, I could see that node1 was trying to use 23.0.0.2 (linke between node2 and 3) to get to node2 though there is a direct link to node 2. The output log is like: [node1:01828] btl: tcp: attempting to connect() to [[19529,1],1] address > 23.0.0.2 on port 1024 > [[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect] > from node1 to: node2 Unable to connect to the peer 23.0.0.2 on port 4: > Network is unreachable I've read the following posts and FAQs but still couldn't understand this kind of behavior. http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3 http://www.open-mpi.org/faq/?category=tcp#tcp-selection http://www.open-mpi.org/community/lists/users/2014/11/25810.php Any pointers would be appreciated! Thanks in advance! My open-mpi info: Package: Open MPI gtbldadm@ubuntu-12 Distribution Open MPI: 1.0.0.22 Open MPI repo revision: git714842d Open MPI release date: May 27, 2015 Open RTE: 1.0.0.22 Open RTE repo revision: git714842d Open RTE release date: May 27, 2015 OPAL: 1.0.0.22 OPAL repo revision: git714842d OPAL release date: May 27, 2015 MPI API: 2.1 Best, Shawn