The real solution is to evict the private addresses from both levels (MPI and ORTE). However, based on the ordering of the interfaces, I guess you cannot do it by name (eth0 has a private address on one side but a public one on the other).
No panic! There is support for this. Look at the output of "ompi_info --param btw tcp" attached below: > MCA btl: parameter "btl_tcp_if_include" (current value: <none>, data > source: default value) > Comma-delimited list of devices or CIDR notation of networks > to use for MPI communication (e.g., "eth0,eth1" or > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with > btl_tcp_if_exclude. > MCA btl: parameter "btl_tcp_if_exclude" (current value: <lo,sppp>, data > source: default value) > Comma-delimited list of devices or CIDR notation of networks > to NOT use for MPI communication -- all devices not matching > these specifications will be used (e.g., "eth0,eth1" or > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with > btl_tcp_if_include. You can use the [btl|oob]_tcp_if_[include|exclude] either with names or with IP ranges. Add the following to your mpirun: --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "210.0.0.0/8" and everything should work in all cases. george. On Oct 5, 2011, at 12:13 , Jeff Squyres wrote: > Check out this FAQ entry: > > http://www.open-mpi.org/faq/?category=tcp#tcp-selection > > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control > MPI-level communications. There's also oob_tcp_if_include / > oob_tcp_if_exclude (that take the same kinds of values as > btl_tcp_if_include/exclude) that control OMPI's run-time environment > communications. > > > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote: > >> "OMPI always tries to use the lowest numbered address first - just a natural >> ordering." >> >> That doesn't seem to be the reason. We changed the private IPs to 212... (a >> higher number than the public 210... IPs) and still MPI tries to go to 212 >> afterwards. >> >> We're reading the oob_tcp and btl_tcp parameters but we're not sure how to >> do it. >> >> "But if hello world doesn't even run, then try running with "mpirun --mca >> oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's >> suggestion. If *that* doesn't work, also add "--mca btl_tcp_if_include ..." >> as well." >> >> We tried doing from Computer 1: >> >> orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig >> >> and everything was ok >> >> We tried doing from Computer 1: >> >> orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig >> >> and it says: >> >> There are no allocated resources for the application >> ifconfig >> that match the requested mapping: >> >> >> Verify that you have mapped the allocated resources properly using the >> --host or --hostfile specification. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to >> launch so we are aborting. [...] >> >> Any other ideas? >> >> >> On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> wrote: >> OMPI always tries to use the lowest numbered address first - just a natural >> ordering. You need to tell it to use just the public ones for this topology. >> Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info --param >> oob tcp" and "ompi_info --param btl tcp" for the exact syntax. >> >> >> Sent from my iPad >> >> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote: >> >>> We are constructing a set of computers with Open MPI and there's a small >>> problem with mixing public and private IPs. >>> >>> We aren't sure about what's causing the problem or how to solve it. >>> >>> The files are shared thanks to NFS and we have a couple computers with >>> private IPs and public IPs that we want them to send MPI work to some >>> machines that have public IPs. >>> >>> I'm going to try to describe with example IPs. >>> >>> Computer 1 sees itself as eth0: 172...2 but has a public IP assigned: >>> 210...2 >>> Computer 2 sees itself as eth0: 172...3 but has a public IP assigned: >>> 210...3 >>> Computers outside the subnet directly have public IPs assigned: 210...100+ >>> >>> The computers outside see Computer 1 and 2 only with 210... they can't see >>> the 172... internal IPs. >>> >>> If an outside computer launches mpirun to Computer 1, it works ok. >>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also >>> works ok (not with 210... because they don't know that that's their public >>> IP, but that's not an issue). >>> >>> The problem comes when Computer 1 or 2 try to launch mpirun to outside >>> computers. >>> >>> We tried to check out what was happening and installed wireshark on an >>> outside computer and it seems that the ssh part works ok (the ssh talk >>> between 210...2 and 210...101 is ok), but after that the outside computer >>> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest >>> of the packets onward the same. >>> >>> Is there a way to solve this problem? >>> >>> I've read this ( >>> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm >>> not really sure what he's doing there. >>> >>> We have the option of plugging Computer 1 and Computer 2 directly to the >>> switch that the outside computers are on, but we'd rather not because we'd >>> prefer the computers to stay on the private network, but if there's no >>> other way, I guess we can. >>> >>> Can it be done without having to change the network topology? >>> >>> Thanks in advance. >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users