On Fri, Jul 08, 2011 at 02:19:27PM -0400, Jeff Squyres wrote: > > The easiest way to fix this is likely to use the btl_tcp_if_include > or btl_tcp_if_exclude MCA parameters -- i.e., tell OMPI exactly > which interfaces to use: > > http://www.open-mpi.org/faq/?category=tcp#tcp-selection >
Perhaps, I'm again misreading the output, but it appears that 1.4.4rc2 does not even see the 2nd nic. hpc:kargl[317] ifconfig bge0 bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE> ether 00:e0:81:40:48:92 inet 10.208.78.111 netmask 0xffffff00 broadcast 10.208.78.255 inet6 fe80::2e0:81ff:fe40:4892%bge0 prefixlen 64 scopeid 0x3 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active hpc:kargl[318] ifconfig bge1 bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE> ether 00:e0:81:40:48:93 inet 192.168.0.10 netmask 0xffffff00 broadcast 192.168.0.255 inet6 fe80::2e0:81ff:fe40:4893%bge1 prefixlen 64 scopeid 0x4 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active kargl[319] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 30 \ --mca btl_tcp_if_include bge1 -machinefile mf1 ./z hpc:kargl[320] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 10 --mca btl_tcp_if_include bge1 -machinefile mf1 ./z [hpc.apl.washington.edu:12295] mca: base: components_open: Looking for btl components [hpc.apl.washington.edu:12295] mca: base: components_open: opening btl components [hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component self [hpc.apl.washington.edu:12295] mca: base: components_open: component self has no register function [hpc.apl.washington.edu:12295] mca: base: components_open: component self open function successful [hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component sm [hpc.apl.washington.edu:12295] mca: base: components_open: component sm has no register function [hpc.apl.washington.edu:12295] mca: base: components_open: component sm open function successful [hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component tcp [hpc.apl.washington.edu:12295] mca: base: components_open: component tcp has no register function [hpc.apl.washington.edu:12295] mca: base: components_open: component tcp open function successful [hpc.apl.washington.edu:12295] select: initializing btl component self [hpc.apl.washington.edu:12295] select: init of component self returned success [hpc.apl.washington.edu:12295] select: initializing btl component sm [hpc.apl.washington.edu:12295] select: init of component sm returned success [hpc.apl.washington.edu:12295] select: initializing btl component tcp [hpc.apl.washington.edu:12295] select: init of component tcp returned success [node11.cimu.org:21878] mca: base: components_open: Looking for btl components [node11.cimu.org:21878] mca: base: components_open: opening btl components [node11.cimu.org:21878] mca: base: components_open: found loaded component self [node11.cimu.org:21878] mca: base: components_open: component self has no register function [node11.cimu.org:21878] mca: base: components_open: component self open function successful [node11.cimu.org:21878] mca: base: components_open: found loaded component sm [node11.cimu.org:21878] mca: base: components_open: component sm has no register function [node11.cimu.org:21878] mca: base: components_open: component sm open function successful [node11.cimu.org:21878] mca: base: components_open: found loaded component tcp [node11.cimu.org:21878] mca: base: components_open: component tcp has no register function [node11.cimu.org:21878] mca: base: components_open: component tcp open function successful [node11.cimu.org:21878] select: initializing btl component self [node11.cimu.org:21878] select: init of component self returned success [node11.cimu.org:21878] select: initializing btl component sm [node11.cimu.org:21878] select: init of component sm returned success [node11.cimu.org:21878] select: initializing btl component tcp [node11.cimu.org][[13916,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances] invalid interface "bge1" [node11.cimu.org:21878] select: init of component tcp returned success -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. -- Steve