Ah...you're dealing with NAT. Sorry, I didn't understand that. OMPI currently doesn't handle NAT well. :-(
There was some work at U. Tennessee to handle NAT nicely, but I think they forked off and made their own release based on an older version of Open MPI. ...or maybe I'm remembering that totally incorrectly. George / someone from UT -- can you comment on this? On Oct 5, 2011, at 12:24 PM, (.-=Kiwi=-.) wrote: > The thing is that there's just one interface: eth0. > > Computer 1 thinks that it has 212... but it actually has a 210 when accessed > from outside. There's no other interface to choose from, just the one that > thinks it's a 212, the eth0. > > Or maybe I'm just not understanding correctly. > > --- > > > > On Wed, Oct 5, 2011 at 6:13 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Check out this FAQ entry: > > http://www.open-mpi.org/faq/?category=tcp#tcp-selection > > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control > MPI-level communications. There's also oob_tcp_if_include / > oob_tcp_if_exclude (that take the same kinds of values as > btl_tcp_if_include/exclude) that control OMPI's run-time environment > communications. > > > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote: > > > "OMPI always tries to use the lowest numbered address first - just a > > natural ordering." > > > > That doesn't seem to be the reason. We changed the private IPs to 212... (a > > higher number than the public 210... IPs) and still MPI tries to go to 212 > > afterwards. > > > > We're reading the oob_tcp and btl_tcp parameters but we're not sure how to > > do it. > > > > "But if hello world doesn't even run, then try running with "mpirun --mca > > oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's > > suggestion. If *that* doesn't work, also add "--mca btl_tcp_if_include > > ..." as well." > > > > We tried doing from Computer 1: > > > > orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig > > > > and everything was ok > > > > We tried doing from Computer 1: > > > > orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig > > > > and it says: > > > > There are no allocated resources for the application > > ifconfig > > that match the requested mapping: > > > > > > Verify that you have mapped the allocated resources properly using the > > --host or --hostfile specification. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > A daemon (pid unknown) died unexpectedly on signal 1 while attempting to > > launch so we are aborting. [...] > > > > Any other ideas? > > > > > > On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> wrote: > > OMPI always tries to use the lowest numbered address first - just a natural > > ordering. You need to tell it to use just the public ones for this > > topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info > > --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax. > > > > > > Sent from my iPad > > > > On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote: > > > >> We are constructing a set of computers with Open MPI and there's a small > >> problem with mixing public and private IPs. > >> > >> We aren't sure about what's causing the problem or how to solve it. > >> > >> The files are shared thanks to NFS and we have a couple computers with > >> private IPs and public IPs that we want them to send MPI work to some > >> machines that have public IPs. > >> > >> I'm going to try to describe with example IPs. > >> > >> Computer 1 sees itself as eth0: 172...2 but has a public IP assigned: > >> 210...2 > >> Computer 2 sees itself as eth0: 172...3 but has a public IP assigned: > >> 210...3 > >> Computers outside the subnet directly have public IPs assigned: 210...100+ > >> > >> The computers outside see Computer 1 and 2 only with 210... they can't see > >> the 172... internal IPs. > >> > >> If an outside computer launches mpirun to Computer 1, it works ok. > >> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also > >> works ok (not with 210... because they don't know that that's their public > >> IP, but that's not an issue). > >> > >> The problem comes when Computer 1 or 2 try to launch mpirun to outside > >> computers. > >> > >> We tried to check out what was happening and installed wireshark on an > >> outside computer and it seems that the ssh part works ok (the ssh talk > >> between 210...2 and 210...101 is ok), but after that the outside computer > >> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest > >> of the packets onward the same. > >> > >> Is there a way to solve this problem? > >> > >> I've read this ( > >> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm > >> not really sure what he's doing there. > >> > >> We have the option of plugging Computer 1 and Computer 2 directly to the > >> switch that the outside computers are on, but we'd rather not because we'd > >> prefer the computers to stay on the private network, but if there's no > >> other way, I guess we can. > >> > >> Can it be done without having to change the network topology? > >> > >> Thanks in advance. > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/