On Fri, Jul 08, 2011 at 02:19:27PM -0400, Jeff Squyres wrote:
> 
> The easiest way to fix this is likely to use the btl_tcp_if_include
> or btl_tcp_if_exclude MCA parameters -- i.e., tell OMPI exactly
> which interfaces to use:
> 
>     http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> 

Perhaps, I'm again misreading the output, but it appears that
1.4.4rc2 does not even see the 2nd nic.

hpc:kargl[317] ifconfig bge0 
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
    ether 00:e0:81:40:48:92
    inet 10.208.78.111 netmask 0xffffff00 broadcast 10.208.78.255
    inet6 fe80::2e0:81ff:fe40:4892%bge0 prefixlen 64 scopeid 0x3 
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
hpc:kargl[318] ifconfig bge1
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
    ether 00:e0:81:40:48:93
    inet 192.168.0.10 netmask 0xffffff00 broadcast 192.168.0.255
    inet6 fe80::2e0:81ff:fe40:4893%bge1 prefixlen 64 scopeid 0x4 
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active

kargl[319] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 30 \
  --mca btl_tcp_if_include bge1 -machinefile mf1 ./z

hpc:kargl[320] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 10 
--mca btl_tcp_if_include bge1 -machinefile mf1 ./z
[hpc.apl.washington.edu:12295] mca: base: components_open: Looking for btl 
components
[hpc.apl.washington.edu:12295] mca: base: components_open: opening btl 
components
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded 
component self
[hpc.apl.washington.edu:12295] mca: base: components_open: component self has 
no register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component self open 
function successful
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded 
component sm
[hpc.apl.washington.edu:12295] mca: base: components_open: component sm has no 
register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component sm open 
function successful
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded 
component tcp
[hpc.apl.washington.edu:12295] mca: base: components_open: component tcp has no 
register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component tcp open 
function successful
[hpc.apl.washington.edu:12295] select: initializing btl component self
[hpc.apl.washington.edu:12295] select: init of component self returned success
[hpc.apl.washington.edu:12295] select: initializing btl component sm
[hpc.apl.washington.edu:12295] select: init of component sm returned success
[hpc.apl.washington.edu:12295] select: initializing btl component tcp
[hpc.apl.washington.edu:12295] select: init of component tcp returned success
[node11.cimu.org:21878] mca: base: components_open: Looking for btl components
[node11.cimu.org:21878] mca: base: components_open: opening btl components
[node11.cimu.org:21878] mca: base: components_open: found loaded component self
[node11.cimu.org:21878] mca: base: components_open: component self has no 
register function
[node11.cimu.org:21878] mca: base: components_open: component self open 
function successful
[node11.cimu.org:21878] mca: base: components_open: found loaded component sm
[node11.cimu.org:21878] mca: base: components_open: component sm has no 
register function
[node11.cimu.org:21878] mca: base: components_open: component sm open function 
successful
[node11.cimu.org:21878] mca: base: components_open: found loaded component tcp
[node11.cimu.org:21878] mca: base: components_open: component tcp has no 
register function
[node11.cimu.org:21878] mca: base: components_open: component tcp open function 
successful
[node11.cimu.org:21878] select: initializing btl component self
[node11.cimu.org:21878] select: init of component self returned success
[node11.cimu.org:21878] select: initializing btl component sm
[node11.cimu.org:21878] select: init of component sm returned success
[node11.cimu.org:21878] select: initializing btl component tcp
[node11.cimu.org][[13916,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
 invalid interface "bge1"
[node11.cimu.org:21878] select: init of component tcp returned success
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

-- 
Steve

Reply via email to