The current version of Open MPI doesn't handle such situations. You either have 
to configure your NAT differently or try to get your hands on one of the 
NAT-aware versions as described here 
http://www-lipn.univ-paris13.fr/~coti/QosCosGrid/qcgompi.php.

  george.

On Oct 10, 2011, at 12:14 , (.-=Kiwi=-.) wrote:

> I'm confused... my IPs right now are:
> 
> Computer 1 (192.168.31.2 internal / 210.1.1.39 external)
> Computer 2 (192.168.31.3 internal / 210.1.1.40 external)
> Computer 3 (210.1.1.137)
> 
> I want Computer 1 to launch mpirun and Computer 3 to do the task.
> 
> I tried both these commands first on Computer 1 and then also on Computer 3:
> 
> ompi_info --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include 
> "210.0.0.0/8" (didn't work, Computer 3 tries to answer to 192.168.31.2 
> instead of 210.1.1.39)
> ompi_info --mca btl_tcp_if_include "210.1.1.0/8" --mca oob_tcp_if_include 
> "210.1.1.0/8" (the same, still answering to the wrong IP).
> 
> What am I doing wrong?
> 
> ---
> 
> 
> 
> On Wed, Oct 5, 2011 at 8:08 PM, George Bosilca <bosi...@eecs.utk.edu> wrote:
> The real solution is to evict the private addresses from both levels (MPI and 
> ORTE). However, based on the ordering of the interfaces, I guess you cannot 
> do it by name (eth0 has a private address on one side but a public one on the 
> other).
> 
> No panic! There is support for this.
> 
> Look at the output of "ompi_info --param btw tcp" attached below:
> 
> >  MCA btl: parameter "btl_tcp_if_include" (current value: <none>, data
> >           source: default value)
> >           Comma-delimited list of devices or CIDR notation of networks
> >           to use for MPI communication (e.g., "eth0,eth1" or
> >           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive with
> >           btl_tcp_if_exclude.
> >  MCA btl: parameter "btl_tcp_if_exclude" (current value: <lo,sppp>, data
> >           source: default value)
> >           Comma-delimited list of devices or CIDR notation of networks
> >           to NOT use for MPI communication -- all devices not matching
> >           these specifications will be used (e.g., "eth0,eth1" or
> >           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive with
> >           btl_tcp_if_include.
> 
> You can use the [btl|oob]_tcp_if_[include|exclude] either with names or with 
> IP ranges. Add the following to your mpirun:
> 
> --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "210.0.0.0/8"
> 
> and everything should work in all cases.
> 
>  george.
> 
> On Oct 5, 2011, at 12:13 , Jeff Squyres wrote:
> 
> > Check out this FAQ entry:
> >
> >    http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> >
> > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control 
> > MPI-level communications.  There's also oob_tcp_if_include / 
> > oob_tcp_if_exclude (that take the same kinds of values as 
> > btl_tcp_if_include/exclude) that control OMPI's run-time environment 
> > communications.
> >
> >
> > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote:
> >
> >> "OMPI always tries to use the lowest numbered address first - just a 
> >> natural ordering."
> >>
> >> That doesn't seem to be the reason. We changed the private IPs to 212... 
> >> (a higher number than the public 210... IPs) and still MPI tries to go to 
> >> 212 afterwards.
> >>
> >> We're reading the oob_tcp and btl_tcp parameters but we're not sure how to 
> >> do it.
> >>
> >> "But if hello world doesn't even run, then try running with "mpirun --mca 
> >> oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's 
> >> suggestion.  If *that* doesn't work, also add "--mca btl_tcp_if_include 
> >> ..." as well."
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig
> >>
> >> and everything was ok
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig
> >>
> >> and it says:
> >>
> >> There are no allocated resources for the application
> >>  ifconfig
> >> that match the requested mapping:
> >>
> >>
> >> Verify that you have mapped the allocated resources properly using the
> >> --host or --hostfile specification.
> >> --------------------------------------------------------------------------
> >> --------------------------------------------------------------------------
> >> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> >> launch so we are aborting. [...]
> >>
> >> Any other ideas?
> >>
> >>
> >> On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> 
> >> wrote:
> >> OMPI always tries to use the lowest numbered address first - just a 
> >> natural ordering. You need to tell it to use just the public ones for this 
> >> topology. Use the oob_tcp and btl_tcp parameters to do this. See 
> >> "ompi_info --param oob tcp" and "ompi_info --param btl tcp" for the exact 
> >> syntax.
> >>
> >>
> >> Sent from my iPad
> >>
> >> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote:
> >>
> >>> We are constructing a set of computers with Open MPI and there's a small 
> >>> problem with mixing public and private IPs.
> >>>
> >>> We aren't sure about what's causing the problem or how to solve it.
> >>>
> >>> The files are shared thanks to NFS and we have a couple computers with 
> >>> private IPs and public IPs that we want them to send MPI work to some 
> >>> machines that have public IPs.
> >>>
> >>> I'm going to try to describe with example IPs.
> >>>
> >>> Computer 1 sees itself as eth0:  172...2  but has a public IP assigned:  
> >>> 210...2
> >>> Computer 2 sees itself as eth0:  172...3  but has a public IP assigned:  
> >>> 210...3
> >>> Computers outside the subnet directly have public IPs assigned:  
> >>> 210...100+
> >>>
> >>> The computers outside see Computer 1 and 2 only with 210... they can't 
> >>> see the 172... internal IPs.
> >>>
> >>> If an outside computer launches mpirun to Computer 1, it works ok.
> >>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also 
> >>> works ok (not with 210... because they don't know that that's their 
> >>> public IP, but that's not an issue).
> >>>
> >>> The problem comes when Computer 1 or 2 try to launch mpirun to outside 
> >>> computers.
> >>>
> >>> We tried to check out what was happening and installed wireshark on an 
> >>> outside computer and it seems that the ssh part works ok (the ssh talk 
> >>> between 210...2 and 210...101 is ok), but after that the outside computer 
> >>> tries to send a TCP SYN package to 172...2 instead of 210...2 and the 
> >>> rest of the packets onward the same.
> >>>
> >>> Is there a way to solve this problem?
> >>>
> >>> I've read this ( 
> >>> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm 
> >>> not really sure what he's doing there.
> >>>
> >>> We have the option of plugging Computer 1 and Computer 2 directly to the 
> >>> switch that the outside computers are on, but we'd rather not because 
> >>> we'd prefer the computers to stay on the private network, but if there's 
> >>> no other way, I guess we can.
> >>>
> >>> Can it be done without having to change the network topology?
> >>>
> >>> Thanks in advance.
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to