On Dec 4, 2013, at 7:25 AM, "Meredith, Karl" <karl.mered...@fmglobal.com> wrote:

> Before turning off my firewall, I have these rules
> 
> $ )sudo ipfw list
> Password:
> 05000 allow ip from any to any via lo*

This is an interesting rule.  Perhaps you can try:

    mpirun --mca oob_tcp_if_include lo0 ...

Which would force OMPI to use the loopback interface for TCP connections (it's 
normally excluded, because it's not viable for off-node communications).  This 
would only be useful for single-node runs, of course.

> Our local IT expert believes that this problem is related to this bug from 
> way back in openmpi 1.2.3, but it seems like the patch was never implemented:
> http://www.open-mpi.org/community/lists/users/2007/05/3344.php

No, I don't believe that's the issue.  Here's why:

- OMPI currently ignores loopback interfaces by default.  This is done because 
the norm is to have multi-server runs, and loopback interfaces are not useful 
for such runs.  Put differently: OMPI defaults to using external IP interfaces.

- However, all your external IP interfaces are firewalled.  So when OMPI tries 
to make a loopback connection on the external IP interfaces, it's blocked.  
Kaboom.  But this makes it easy to understand why when you disable the 
firewall, it works.

- That bug report you cited (good research, BTW!) is because we had a problem 
in parsing the oob_tcp_if_include MCA parameter way back in the 1.2.x series, 
which has since been fixed.  The user was trying to explicitly tell OMPI "use 
the lo0 interface" (i.e., override the default of *not* using the lo0 
interface), and the bug prevented that from working.  That bug has long since 
been fixed: you can override OMPI's default of not using lo0.  You should then 
be able to run without disabling your firewall (that's what the mpirun syntax I 
cited above above is doing).

- As noted above, using lo0 for multi-server runs is a bad idea; it won't work 
(OMPI may get confused and think that it can use 127.0.0.0/8 to contact 
multiple servers, because by the netmask, it hypothetically can).  But you can 
do it for runs limited to your local laptop with no problem.

- The real solution, as Ralph implied is to stop using external IP interfaces 
for single-server control messages (we talked about this off-list).  Let me 
explain this statement a bit...  OMPI has 2 main channels for communication: a) 
control messages and b) MPI traffic.  MPI traffic is already smart enough to 
use shared memory for single-server MPI traffic and some form of network for 
off-server MPI traffic.  The control message plane doesn't currently make that 
distinction -- it uses IP interfaces for *all* traffic (and defaults to not 
using loopback interfaces), regardless of destination.  So the real solution is 
to make the control message plane a little smarter: put a named unix domain 
socket in the filesystem on the local server and let local control messages use 
that (instead of external IP addresses).  FWIW, this is what LAM/MPI used to 
do; we just never adopted that into Open MPI (LAM/MPI was one of Open MPI's 
predecessors).

This feature may take a little time to implement, and may or may not make it 
into the v1.7.x series.  But you should be able to use the oob_tcp_if_include 
MCA param in the meantime (see the FAQ for different ways to set MCA params; 
you can stick it in an environment variable or file instead of manually 
including it on the mpirun command line all the time, if that's more 
convenient).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to