The thing is that there's just one interface: eth0. Computer 1 thinks that it has 212... but it actually has a 210 when accessed from outside. There's no other interface to choose from, just the one that thinks it's a 212, the eth0.
Or maybe I'm just not understanding correctly. --- On Wed, Oct 5, 2011 at 6:13 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Check out this FAQ entry: > > http://www.open-mpi.org/faq/?category=tcp#tcp-selection > > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control > MPI-level communications. There's also oob_tcp_if_include / > oob_tcp_if_exclude (that take the same kinds of values as > btl_tcp_if_include/exclude) that control OMPI's run-time environment > communications. > > > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote: > > > "OMPI always tries to use the lowest numbered address first - just a > natural ordering." > > > > That doesn't seem to be the reason. We changed the private IPs to 212... > (a higher number than the public 210... IPs) and still MPI tries to go to > 212 afterwards. > > > > We're reading the oob_tcp and btl_tcp parameters but we're not sure how > to do it. > > > > "But if hello world doesn't even run, then try running with "mpirun --mca > oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's > suggestion. If *that* doesn't work, also add "--mca btl_tcp_if_include ..." > as well." > > > > We tried doing from Computer 1: > > > > orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig > > > > and everything was ok > > > > We tried doing from Computer 1: > > > > orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig > > > > and it says: > > > > There are no allocated resources for the application > > ifconfig > > that match the requested mapping: > > > > > > Verify that you have mapped the allocated resources properly using the > > --host or --hostfile specification. > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > A daemon (pid unknown) died unexpectedly on signal 1 while attempting to > > launch so we are aborting. [...] > > > > Any other ideas? > > > > > > On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> > wrote: > > OMPI always tries to use the lowest numbered address first - just a > natural ordering. You need to tell it to use just the public ones for this > topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info > --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax. > > > > > > Sent from my iPad > > > > On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote: > > > >> We are constructing a set of computers with Open MPI and there's a small > problem with mixing public and private IPs. > >> > >> We aren't sure about what's causing the problem or how to solve it. > >> > >> The files are shared thanks to NFS and we have a couple computers with > private IPs and public IPs that we want them to send MPI work to some > machines that have public IPs. > >> > >> I'm going to try to describe with example IPs. > >> > >> Computer 1 sees itself as eth0: 172...2 but has a public IP assigned: > 210...2 > >> Computer 2 sees itself as eth0: 172...3 but has a public IP assigned: > 210...3 > >> Computers outside the subnet directly have public IPs assigned: > 210...100+ > >> > >> The computers outside see Computer 1 and 2 only with 210... they can't > see the 172... internal IPs. > >> > >> If an outside computer launches mpirun to Computer 1, it works ok. > >> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also > works ok (not with 210... because they don't know that that's their public > IP, but that's not an issue). > >> > >> The problem comes when Computer 1 or 2 try to launch mpirun to outside > computers. > >> > >> We tried to check out what was happening and installed wireshark on an > outside computer and it seems that the ssh part works ok (the ssh talk > between 210...2 and 210...101 is ok), but after that the outside computer > tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest > of the packets onward the same. > >> > >> Is there a way to solve this problem? > >> > >> I've read this ( > http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm > not really sure what he's doing there. > >> > >> We have the option of plugging Computer 1 and Computer 2 directly to the > switch that the outside computers are on, but we'd rather not because we'd > prefer the computers to stay on the private network, but if there's no other > way, I guess we can. > >> > >> Can it be done without having to change the network topology? > >> > >> Thanks in advance. > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >