So, I tried out the flag that you mentioned would force the use of loopback interface. It worked without error or stalling:
$ )mpirun --mca oob_tcp_if_include lo0 -np 2 ./hello_cxx Hello, world! I am 0 of 2(Open MPI v1.7.3, package: Open MPI macpo...@meredithk-mac.corp.fmglobal.com Distribution, ident: 1.7.3, Oct 17, 2013, 117) Hello, world! I am 1 of 2(Open MPI v1.7.3, package: Open MPI macpo...@meredithk-mac.corp.fmglobal.com Distribution, ident: 1.7.3, Oct 17, 2013, 117) Thanks for all your help! Karl On Dec 4, 2013, at 8:23 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Dec 4, 2013, at 7:25 AM, "Meredith, Karl" <karl.mered...@fmglobal.com> > wrote: > >> Before turning off my firewall, I have these rules >> >> $ )sudo ipfw list >> Password: >> 05000 allow ip from any to any via lo* > > This is an interesting rule. Perhaps you can try: > > mpirun --mca oob_tcp_if_include lo0 ... > > Which would force OMPI to use the loopback interface for TCP connections > (it's normally excluded, because it's not viable for off-node > communications). This would only be useful for single-node runs, of course. > >> Our local IT expert believes that this problem is related to this bug from >> way back in openmpi 1.2.3, but it seems like the patch was never implemented: >> http://www.open-mpi.org/community/lists/users/2007/05/3344.php > > No, I don't believe that's the issue. Here's why: > > - OMPI currently ignores loopback interfaces by default. This is done > because the norm is to have multi-server runs, and loopback interfaces are > not useful for such runs. Put differently: OMPI defaults to using external > IP interfaces. > > - However, all your external IP interfaces are firewalled. So when OMPI > tries to make a loopback connection on the external IP interfaces, it's > blocked. Kaboom. But this makes it easy to understand why when you disable > the firewall, it works. > > - That bug report you cited (good research, BTW!) is because we had a problem > in parsing the oob_tcp_if_include MCA parameter way back in the 1.2.x series, > which has since been fixed. The user was trying to explicitly tell OMPI "use > the lo0 interface" (i.e., override the default of *not* using the lo0 > interface), and the bug prevented that from working. That bug has long since > been fixed: you can override OMPI's default of not using lo0. You should > then be able to run without disabling your firewall (that's what the mpirun > syntax I cited above above is doing). > > - As noted above, using lo0 for multi-server runs is a bad idea; it won't > work (OMPI may get confused and think that it can use 127.0.0.0/8 to contact > multiple servers, because by the netmask, it hypothetically can). But you > can do it for runs limited to your local laptop with no problem. > > - The real solution, as Ralph implied is to stop using external IP interfaces > for single-server control messages (we talked about this off-list). Let me > explain this statement a bit... OMPI has 2 main channels for communication: > a) control messages and b) MPI traffic. MPI traffic is already smart enough > to use shared memory for single-server MPI traffic and some form of network > for off-server MPI traffic. The control message plane doesn't currently make > that distinction -- it uses IP interfaces for *all* traffic (and defaults to > not using loopback interfaces), regardless of destination. So the real > solution is to make the control message plane a little smarter: put a named > unix domain socket in the filesystem on the local server and let local > control messages use that (instead of external IP addresses). FWIW, this is > what LAM/MPI used to do; we just never adopted that into Open MPI (LAM/MPI > was one of Open MPI's predecessors). > > This feature may take a little time to implement, and may or may not make it > into the v1.7.x series. But you should be able to use the oob_tcp_if_include > MCA param in the meantime (see the FAQ for different ways to set MCA params; > you can stick it in an environment variable or file instead of manually > including it on the mpirun command line all the time, if that's more > convenient). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users