The current version of Open MPI doesn't handle such situations. You either have to configure your NAT differently or try to get your hands on one of the NAT-aware versions as described here http://www-lipn.univ-paris13.fr/~coti/QosCosGrid/qcgompi.php.
george. On Oct 10, 2011, at 12:14 , (.-=Kiwi=-.) wrote: > I'm confused... my IPs right now are: > > Computer 1 (192.168.31.2 internal / 210.1.1.39 external) > Computer 2 (192.168.31.3 internal / 210.1.1.40 external) > Computer 3 (210.1.1.137) > > I want Computer 1 to launch mpirun and Computer 3 to do the task. > > I tried both these commands first on Computer 1 and then also on Computer 3: > > ompi_info --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include > "210.0.0.0/8" (didn't work, Computer 3 tries to answer to 192.168.31.2 > instead of 210.1.1.39) > ompi_info --mca btl_tcp_if_include "210.1.1.0/8" --mca oob_tcp_if_include > "210.1.1.0/8" (the same, still answering to the wrong IP). > > What am I doing wrong? > > --- > > > > On Wed, Oct 5, 2011 at 8:08 PM, George Bosilca <bosi...@eecs.utk.edu> wrote: > The real solution is to evict the private addresses from both levels (MPI and > ORTE). However, based on the ordering of the interfaces, I guess you cannot > do it by name (eth0 has a private address on one side but a public one on the > other). > > No panic! There is support for this. > > Look at the output of "ompi_info --param btw tcp" attached below: > > > MCA btl: parameter "btl_tcp_if_include" (current value: <none>, data > > source: default value) > > Comma-delimited list of devices or CIDR notation of networks > > to use for MPI communication (e.g., "eth0,eth1" or > > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with > > btl_tcp_if_exclude. > > MCA btl: parameter "btl_tcp_if_exclude" (current value: <lo,sppp>, data > > source: default value) > > Comma-delimited list of devices or CIDR notation of networks > > to NOT use for MPI communication -- all devices not matching > > these specifications will be used (e.g., "eth0,eth1" or > > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with > > btl_tcp_if_include. > > You can use the [btl|oob]_tcp_if_[include|exclude] either with names or with > IP ranges. Add the following to your mpirun: > > --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "210.0.0.0/8" > > and everything should work in all cases. > > george. > > On Oct 5, 2011, at 12:13 , Jeff Squyres wrote: > > > Check out this FAQ entry: > > > > http://www.open-mpi.org/faq/?category=tcp#tcp-selection > > > > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control > > MPI-level communications. There's also oob_tcp_if_include / > > oob_tcp_if_exclude (that take the same kinds of values as > > btl_tcp_if_include/exclude) that control OMPI's run-time environment > > communications. > > > > > > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote: > > > >> "OMPI always tries to use the lowest numbered address first - just a > >> natural ordering." > >> > >> That doesn't seem to be the reason. We changed the private IPs to 212... > >> (a higher number than the public 210... IPs) and still MPI tries to go to > >> 212 afterwards. > >> > >> We're reading the oob_tcp and btl_tcp parameters but we're not sure how to > >> do it. > >> > >> "But if hello world doesn't even run, then try running with "mpirun --mca > >> oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's > >> suggestion. If *that* doesn't work, also add "--mca btl_tcp_if_include > >> ..." as well." > >> > >> We tried doing from Computer 1: > >> > >> orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig > >> > >> and everything was ok > >> > >> We tried doing from Computer 1: > >> > >> orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig > >> > >> and it says: > >> > >> There are no allocated resources for the application > >> ifconfig > >> that match the requested mapping: > >> > >> > >> Verify that you have mapped the allocated resources properly using the > >> --host or --hostfile specification. > >> -------------------------------------------------------------------------- > >> -------------------------------------------------------------------------- > >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to > >> launch so we are aborting. [...] > >> > >> Any other ideas? > >> > >> > >> On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.open...@gmail.com> > >> wrote: > >> OMPI always tries to use the lowest numbered address first - just a > >> natural ordering. You need to tell it to use just the public ones for this > >> topology. Use the oob_tcp and btl_tcp parameters to do this. See > >> "ompi_info --param oob tcp" and "ompi_info --param btl tcp" for the exact > >> syntax. > >> > >> > >> Sent from my iPad > >> > >> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffe...@gmail.com> wrote: > >> > >>> We are constructing a set of computers with Open MPI and there's a small > >>> problem with mixing public and private IPs. > >>> > >>> We aren't sure about what's causing the problem or how to solve it. > >>> > >>> The files are shared thanks to NFS and we have a couple computers with > >>> private IPs and public IPs that we want them to send MPI work to some > >>> machines that have public IPs. > >>> > >>> I'm going to try to describe with example IPs. > >>> > >>> Computer 1 sees itself as eth0: 172...2 but has a public IP assigned: > >>> 210...2 > >>> Computer 2 sees itself as eth0: 172...3 but has a public IP assigned: > >>> 210...3 > >>> Computers outside the subnet directly have public IPs assigned: > >>> 210...100+ > >>> > >>> The computers outside see Computer 1 and 2 only with 210... they can't > >>> see the 172... internal IPs. > >>> > >>> If an outside computer launches mpirun to Computer 1, it works ok. > >>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also > >>> works ok (not with 210... because they don't know that that's their > >>> public IP, but that's not an issue). > >>> > >>> The problem comes when Computer 1 or 2 try to launch mpirun to outside > >>> computers. > >>> > >>> We tried to check out what was happening and installed wireshark on an > >>> outside computer and it seems that the ssh part works ok (the ssh talk > >>> between 210...2 and 210...101 is ok), but after that the outside computer > >>> tries to send a TCP SYN package to 172...2 instead of 210...2 and the > >>> rest of the packets onward the same. > >>> > >>> Is there a way to solve this problem? > >>> > >>> I've read this ( > >>> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm > >>> not really sure what he's doing there. > >>> > >>> We have the option of plugging Computer 1 and Computer 2 directly to the > >>> switch that the outside computers are on, but we'd rather not because > >>> we'd prefer the computers to stay on the private network, but if there's > >>> no other way, I guess we can. > >>> > >>> Can it be done without having to change the network topology? > >>> > >>> Thanks in advance. > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users