George,

This is one of the things I tried, and the setting the oob interface did not work,
with the error message below.

Also, per this thread:
        http://www.open-mpi.org/community/lists/users/2007/05/3319.php
I believe it is oob_tcp_include, not oob_tcp_if_include. The latter is silently
ignored in 1.2, as far as I can tell.

Interestingly, telling the MPI layer to use lo0 (or to not use tcp at all) works fine. But when I try to do the same for the OOB layer, it complains. The full error is:

[mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address '' returned for selected oob interfaces. [mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c at line 1196

mpirun actually hangs at this point and no processes are spawned. I have to ^C to stop it.
I see this behavior on both Mac OS and on Linux with 1.2.2.

Bill


George Bosilica wrote:
There are 2 sets of sockets: one for the oob layer and one for the
MPI layer (at least if TCP support is enabled). Therefore, in order
to achieve what you're looking for you should add to the command line
"--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0".
On May 29, 2007, at 3:58 PM, Bill Saphir wrote:


----- original message below ---

We have run into the following problem:

- start up Open MPI application on a laptop
- disconnect from network
- application hangs

I believe that the problem is that all sockets created by Open MPI are bound to the external network interface. For example, when I start up a 2 process MPI job on my Mac (no hosts specified), I get the following tcp
connections. 192.168.5.2 is an address on my LAN.

tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463 ESTABLISHED tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462 ESTABLISHED tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460 ESTABLISHED tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458 ESTABLISHED tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456 ESTABLISHED

Since this application is confined to a single machine, I would like it to use 127.0.0.1, which will remain available as the laptop moves around. I am unable to force it to bind
sockets to this address, however.

Some of the things I've tried are:
- explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
- turning off the tcp btl (--mca btl ^tcp) and other variations (-- mca btl self,sm)
- using --mca oob_tcp_include lo0

The first two have no effect. The last one results in an error message of: [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address '' returned for selected oob interfaces.

Is there any way to force Open MPI to bind all sockets to 127.0.0.1?

As a side question -- I'm curious what all of these tcp connections are used for. As I increase the number of processes, it looks like there are 4 sockets created per MPI process, without using the tcp btl.
Perhaps stdin/out/err + control?

Bill



Reply via email to