Ah...you're dealing with NAT.  Sorry, I didn't understand that.

OMPI currently doesn't handle NAT well.  :-(

There was some work at U. Tennessee to handle NAT nicely, but I think they 
forked off and made their own release based on an older version of Open MPI.  
...or maybe I'm remembering that totally incorrectly.  

George / someone from UT -- can you comment on this?



On Oct 5, 2011, at 12:24 PM, (.-=Kiwi=-.) wrote:

> The thing is that there's just one interface: eth0.
> 
> Computer 1 thinks that it has 212... but it actually has a 210 when accessed 
> from outside. There's no other interface to choose from, just the one that 
> thinks it's a 212, the eth0.
> 
> Or maybe I'm just not understanding correctly.
> 
> ---
> 
> 
> 
> On Wed, Oct 5, 2011 at 6:13 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> Check out this FAQ entry:
> 
>    http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> 
> Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control 
> MPI-level communications.  There's also oob_tcp_if_include / 
> oob_tcp_if_exclude (that take the same kinds of values as 
> btl_tcp_if_include/exclude) that control OMPI's run-time environment 
> communications.
> 
> 
> On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote:
> 
> > "OMPI always tries to use the lowest numbered address first - just a 
> > natural ordering."
> >
> > That doesn't seem to be the reason. We changed the private IPs to 212... (a 
> > higher number than the public 210... IPs) and still MPI tries to go to 212 
> > afterwards.
> >
> > We're reading the oob_tcp and btl_tcp parameters but we're not sure how to 
> > do it.
> >
> > "But if hello world doesn't even run, then try running with "mpirun --mca 
> > oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's 
> > suggestion.  If *that* doesn't work, also add "--mca btl_tcp_if_include 
> > ..." as well."
> >
> > We tried doing from Computer 1:
> >
> > orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig
> >
> > and everything was ok
> >
> > We tried doing from Computer 1:
> >
> > orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig
> >
> > and it says:
> >
> > There are no allocated resources for the application
> >   ifconfig
> > that match the requested mapping:
> >
> >
> > Verify that you have mapped the allocated resources properly using the
> > --host or --hostfile specification.
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> > launch so we are aborting. [...]
> >
> > Any other ideas?
> >
> >
> > On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> wrote:
> > OMPI always tries to use the lowest numbered address first - just a natural 
> > ordering. You need to tell it to use just the public ones for this 
> > topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info 
> > --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax.
> >
> >
> > Sent from my iPad
> >
> > On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote:
> >
> >> We are constructing a set of computers with Open MPI and there's a small 
> >> problem with mixing public and private IPs.
> >>
> >> We aren't sure about what's causing the problem or how to solve it.
> >>
> >> The files are shared thanks to NFS and we have a couple computers with 
> >> private IPs and public IPs that we want them to send MPI work to some 
> >> machines that have public IPs.
> >>
> >> I'm going to try to describe with example IPs.
> >>
> >> Computer 1 sees itself as eth0:  172...2  but has a public IP assigned:  
> >> 210...2
> >> Computer 2 sees itself as eth0:  172...3  but has a public IP assigned:  
> >> 210...3
> >> Computers outside the subnet directly have public IPs assigned:  210...100+
> >>
> >> The computers outside see Computer 1 and 2 only with 210... they can't see 
> >> the 172... internal IPs.
> >>
> >> If an outside computer launches mpirun to Computer 1, it works ok.
> >> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also 
> >> works ok (not with 210... because they don't know that that's their public 
> >> IP, but that's not an issue).
> >>
> >> The problem comes when Computer 1 or 2 try to launch mpirun to outside 
> >> computers.
> >>
> >> We tried to check out what was happening and installed wireshark on an 
> >> outside computer and it seems that the ssh part works ok (the ssh talk 
> >> between 210...2 and 210...101 is ok), but after that the outside computer 
> >> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest 
> >> of the packets onward the same.
> >>
> >> Is there a way to solve this problem?
> >>
> >> I've read this ( 
> >> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm 
> >> not really sure what he's doing there.
> >>
> >> We have the option of plugging Computer 1 and Computer 2 directly to the 
> >> switch that the outside computers are on, but we'd rather not because we'd 
> >> prefer the computers to stay on the private network, but if there's no 
> >> other way, I guess we can.
> >>
> >> Can it be done without having to change the network topology?
> >>
> >> Thanks in advance.
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to