Sorry I wasn't clear enough on my previous post. The error messages that you get are comming from the OOB which is the framework we're using to setup the MPI run. The options that you use (btl_tcp_if_include) are only used for MPI communications. Please add "--mca oob_tcp_include eth0" to force the OOB framework to use eth0. In order to don't have to type all these options all the time you can add them in the $(HOME).openmpi/mca-params.conf file. A file containing:
oob_tcp_include=eth1 btl_tcp_if_include=eth1 should solve your problems, if the firewall is opened on eth1 between these nodes. Thanks, george. On Thu, 16 Mar 2006, Charles Wright wrote: > Thanks for the tip. > > I see that both number 1 and 2 are true. > Openmpi is insisting on using my eth0 (I know this by watching the > firewall log on the node it is trying to go to) > > This is despite the fact that I have the first dns entry go to eth1, > normally that is all pbs would need to do the right thing and use the > network I prefer. > > Ok so I see there are some options to in/exclude interfaces. > > however mpiexec is igorning my requests. > I tried it two ways. Neither worked. Firewall rejects traffic coming > into 1.0.x.x. network in both cases. > > /opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1 > -n 2 $XD1LAUNCHER ./mpimeasure > /opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0 > -n 2 $XD1LAUNCHER ./mpimeasure > > (see dns works... not over eth0) > uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig > eth0 Link encap:Ethernet HWaddr 00:0E:AB:01:58:60 > inet addr:1.0.21.134 Bcast:1.127.255.255 Mask:255.128.0.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0 > TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:560395541 (534.4 Mb) TX bytes:34367848 (32.7 Mb) > Interrupt:16 > > eth1 Link encap:Ethernet HWaddr 00:0E:AB:01:58:61 > inet addr:1.128.21.134 Mask:255.128.0.0 > UP RUNNING NOARP MTU:1500 Metric:1 > RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:6203028277 (5915.6 Mb) TX bytes:566471561 (540.2 Mb) > Interrupt:25 > > eth2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:829064 errors:0 dropped:0 overruns:0 frame:0 > TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:61216408 (58.3 Mb) TX bytes:19079579 (18.1 Mb) > Base address:0x2000 Memory:fea80000-feaa0000 > > eth2:2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62 > inet addr:129.66.9.146 Bcast:129.66.9.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > Base address:0x2000 Memory:fea80000-feaa0000 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:14259 errors:0 dropped:0 overruns:0 frame:0 > TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:879631 (859.0 Kb) TX bytes:879631 (859.0 Kb) > > uahrcw@c344-6:~/mpi-benchmarks> ping c344-5 > PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data. > 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64 > time=0.067 ms > 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64 > time=0.037 ms > 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64 > time=0.022 ms > > --- c344-5.x.asc.edu ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 1999ms > rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms > > > > George Bosilca wrote: >> I see only 2 possibilities: >> 1. your trying to run Open MPI on nodes having multiple IP >> addresses. >> 2. your nodes are behind firewalls and Open MPI is unable to pass through. >> >> Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full >> answer to your question. >> >> Thanks, >> george. >> >> On Thu, 16 Mar 2006, Charles Wright wrote: >> >> >>> Hello, >>> I'm just compiled open-mpi and tried to run my code which just >>> measures bandwidth from one node to another. (Code compile fine and >>> runs under other mpi implementations) >>> >>> When I did I got this. >>> >>> uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380 >>> c317-6 >>> c317-5 >>> [c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>> connection failed (errno=110) - retrying (pid=24979) >>> [c317-5:24979] mca_oob_tcp_peer_timer_handler >>> [c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>> connection failed (errno=110) - retrying (pid=24997) >>> [c317-5:24997] mca_oob_tcp_peer_timer_handler >>> >>> [0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] >>> connect() failed with errno=110 >>> >>> >>> I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has >>> something to do with it. >>> >>> I've attached my config.log >>> >>> Any help with this would be appreciated. >>> >>> uahrcw@c275-6:~/mpi-benchmarks> ompi_info >>> Open MPI: 1.0.1r8453 >>> Open MPI SVN revision: r8453 >>> Open RTE: 1.0.1r8453 >>> Open RTE SVN revision: r8453 >>> OPAL: 1.0.1r8453 >>> OPAL SVN revision: r8453 >>> Prefix: /opt/asn/apps/openmpi-1.0.1 >>> Configured architecture: x86_64-unknown-linux-gnu >>> Configured by: asnrcw >>> Configured on: Fri Feb 24 15:19:37 CST 2006 >>> Configure host: c275-6 >>> Built by: asnrcw >>> Built on: Fri Feb 24 15:40:09 CST 2006 >>> Built host: c275-6 >>> C bindings: yes >>> C++ bindings: yes >>> Fortran77 bindings: yes (all) >>> Fortran90 bindings: no >>> C compiler: gcc >>> C compiler absolute: /usr/bin/gcc >>> C++ compiler: g++ >>> C++ compiler absolute: /usr/bin/g++ >>> Fortran77 compiler: g77 >>> Fortran77 compiler abs: /usr/bin/g77 >>> Fortran90 compiler: ifort >>> Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort >>> C profiling: yes >>> C++ profiling: yes >>> Fortran77 profiling: yes >>> Fortran90 profiling: no >>> C++ exceptions: no >>> Thread support: posix (mpi: no, progress: no) >>> Internal debug support: no >>> MPI parameter check: runtime >>> Memory profiling support: no >>> Memory debugging support: no >>> libltdl support: 1 >>> MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component >>> v1.0.1) >>> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) >>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) >>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) >>> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: daemon (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA pls: tm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA sds: env (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.1) >>> uahrcw@c275-6:~/mpi-benchmarks> >>> >>> >>> >>> >> >> "We must accept finite disappointment, but we must never lose infinite >> hope." >> Martin Luther King >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > "We must accept finite disappointment, but we must never lose infinite hope." Martin Luther King