We have run into the following problem:

- start up Open MPI application on a laptop
- disconnect from network
- application hangs

I believe that the problem is that all sockets created by Open MPI are bound to the external network interface. For example, when I start up a 2 process MPI job on my Mac (no hosts specified), I get the following tcp
connections. 192.168.5.2 is an address on my LAN.

tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463 ESTABLISHED tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462 ESTABLISHED tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460 ESTABLISHED tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456 ESTABLISHED tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458 ESTABLISHED tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456 ESTABLISHED

Since this application is confined to a single machine, I would like it to use 127.0.0.1, which will remain available as the laptop moves around. I am unable to force it to bind
sockets to this address, however.

Some of the things I've tried are:
- explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
- turning off the tcp btl (--mca btl ^tcp) and other variations (-- mca btl self,sm)
- using --mca oob_tcp_include lo0

The first two have no effect. The last one results in an error message of: [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address '' returned for selected oob interfaces.

Is there any way to force Open MPI to bind all sockets to 127.0.0.1?

As a side question -- I'm curious what all of these tcp connections are used for. As I increase the number of processes, it looks like there are 4 sockets created per MPI process, without using the tcp btl.
Perhaps stdin/out/err + control?

Bill


Reply via email to