Hi there,

My application uses MPI to run parallel jobs on a single node, so I have no 
need of any support for communication between nodes.  However, when I use 
mpirun to launch my application I see strange errors such as:

--------------------------------------------------------------------------
No network interfaces were found for out-of-band communications. We require
at least one available network for out-of-band messaging.
--------------------------------------------------------------------------

[nid23206:10697] [[33772,1],0] ORTE_ERROR_LOG: Unable to open a TCP socket for 
out-of-band communications in file oob_tcp_listener.c at line 113
[nid23206:10697] [[33772,1],0] ORTE_ERROR_LOG: Unable to open a TCP socket for 
out-of-band communications in file oob_tcp_component.c at line 584
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_oob_base_select failed
  --> Returned value (null) (-43) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

/home/leeping/opt/qchem-4.2/ext-libs/openmpi/lib/libmpi.so.1(+0xfeaa9)[0x2b77e9de5aa9]
/home/leeping/opt/qchem-4.2/ext-libs/openmpi/lib/libmpi.so.1(ompi_btl_openib_connect_base_select_for_local_port+0xd0)[0x2b77e9de13a0]

It seems like in each case, OpenMPI is trying to use some feature related to 
networking and crashing as a result.  My workaround is to deduce the components 
that are crashing and disable them in my environment variables like this:

export OMPI_MCA_btl=self,sm
export OMPI_MCA_oob=^tcp

Is there a better way to do this - i.e. explicitly prohibit OpenMPI from using 
any network-related feature and run only on the local node?

Thanks,

- Lee-Ping

Reply via email to