George,
This is one of the things I tried, and the setting the oob interface
did not work,
with the error message below.
Also, per this thread:
http://www.open-mpi.org/community/lists/users/2007/05/3319.php
I believe it is oob_tcp_include, not oob_tcp_if_include. The latter
is silently
ignored in 1.2, as far as I can tell.
Interestingly, telling the MPI layer to use lo0 (or to not use tcp at
all) works fine.
But when I try to do the same for the OOB layer, it complains. The
full error is:
[mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address ''
returned for selected oob interfaces.
[mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c
at line 1196
mpirun actually hangs at this point and no processes are spawned. I
have to ^C to stop it.
I see this behavior on both Mac OS and on Linux with 1.2.2.
Bill
George Bosilica wrote:
There are 2 sets of sockets: one for the oob layer and one for the
MPI layer (at least if TCP support is enabled). Therefore, in order
to achieve what you're looking for you should add to the command line
"--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0".
On May 29, 2007, at 3:58 PM, Bill Saphir wrote:
----- original message below ---
We have run into the following problem:
- start up Open MPI application on a laptop
- disconnect from network
- application hangs
I believe that the problem is that all sockets created by Open MPI
are bound to the external network interface.
For example, when I start up a 2 process MPI job on my Mac (no
hosts specified), I get the following tcp
connections. 192.168.5.2 is an address on my LAN.
tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463
ESTABLISHED
tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459
ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462
ESTABLISHED
tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456
ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460
ESTABLISHED
tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456
ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458
ESTABLISHED
tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456
ESTABLISHED
Since this application is confined to a single machine, I would
like it to use 127.0.0.1,
which will remain available as the laptop moves around. I am unable
to force it to bind
sockets to this address, however.
Some of the things I've tried are:
- explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
- turning off the tcp btl (--mca btl ^tcp) and other variations (--
mca btl self,sm)
- using --mca oob_tcp_include lo0
The first two have no effect. The last one results in an error
message of:
[myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address ''
returned for selected oob interfaces.
Is there any way to force Open MPI to bind all sockets to 127.0.0.1?
As a side question -- I'm curious what all of these tcp connections
are used for. As I increase the number
of processes, it looks like there are 4 sockets created per MPI
process, without using the tcp btl.
Perhaps stdin/out/err + control?
Bill