Shang,

can you please run
mpirun --version
i cannot find the ompi version you are running based on the git hash you reported

as a temporary workaround, you can do minimal tcp routing :
on the three nodes
1) run
sysctl -w net.ipv4.ip_forward=1

2) route the other nodes interface not on the same network
for example, on node 1, you can run
route add -host 23.0.0.2 gw 12.0.0.2
route add -host 23.0.0.3 gw 13.0.0.3

Cheers,

Gilles

On 9/18/2015 1:31 AM, Shang Li wrote:
Hi all,

I wanted to setup a 3-node ring network, each connects to the other 2 using 2 Ethernet ports directly without a switch/router.

The interface configurations could be found in the following picture.

https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0

I've used /ifconfig /on each node to configure each port, and made sure I can ssh from each node to the other 2 nodes.

But a simplering_c <https://github.com/open-mpi/ompi/blob/master/examples/ring_c.c> example doesn't work... So I turn on --mca btl_base_verbose 30, I could see that node1 was trying to use 23.0.0.2 (linke between node2 and 3) to get to node2 though there is a direct link to node 2.

The output log is like:

    [node1:01828] btl: tcp: attempting to connect() to [[19529,1],1]
    address 23.0.0.2 on port 1024
    [[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect]
    from node1 to: node2 Unable to connect to the peer 23.0.0.2  on
    port 4: Network is unreachable


I've read the following posts and FAQs but still couldn't understand this kind of behavior.

http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
http://www.open-mpi.org/community/lists/users/2014/11/25810.php


Any pointers would be appreciated! Thanks in advance!

My open-mpi info:

 Package: Open MPI gtbldadm@ubuntu-12 Distribution
                Open MPI: 1.0.0.22
  Open MPI repo revision: git714842d
   Open MPI release date: May 27, 2015
                Open RTE: 1.0.0.22
  Open RTE repo revision: git714842d
   Open RTE release date: May 27, 2015
                    OPAL: 1.0.0.22
      OPAL repo revision: git714842d
       OPAL release date: May 27, 2015
                 MPI API: 2.1


Best,
Shawn



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27612.php

Reply via email to