That works!! Thanks!! George Bosilca wrote: >Sorry I wasn't clear enough on my previous post. The error messages that >you get are comming from the OOB which is the framework we're using to >setup the MPI run. The options that you use (btl_tcp_if_include) are only >used for MPI communications. Please add "--mca oob_tcp_include eth0" to >force the OOB framework to use eth0. In order to don't have to type all >these options all the time you can add them in the >$(HOME).openmpi/mca-params.conf file. A file containing: > >oob_tcp_include=eth1 >btl_tcp_if_include=eth1 > >should solve your problems, if the firewall is opened on eth1 between >these nodes. > > Thanks, > george. > >On Thu, 16 Mar 2006, Charles Wright wrote: > > >>Thanks for the tip. >> >>I see that both number 1 and 2 are true. >>Openmpi is insisting on using my eth0 (I know this by watching the >>firewall log on the node it is trying to go to) >> >>This is despite the fact that I have the first dns entry go to eth1, >>normally that is all pbs would need to do the right thing and use the >>network I prefer. >> >>Ok so I see there are some options to in/exclude interfaces. >> >>however mpiexec is igorning my requests. >>I tried it two ways. Neither worked. Firewall rejects traffic coming >>into 1.0.x.x. network in both cases. >> >>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1 >>-n 2 $XD1LAUNCHER ./mpimeasure >>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0 >>-n 2 $XD1LAUNCHER ./mpimeasure >> >>(see dns works... not over eth0) >>uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig >>eth0 Link encap:Ethernet HWaddr 00:0E:AB:01:58:60 >> inet addr:1.0.21.134 Bcast:1.127.255.255 Mask:255.128.0.0 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:560395541 (534.4 Mb) TX bytes:34367848 (32.7 Mb) >> Interrupt:16 >> >>eth1 Link encap:Ethernet HWaddr 00:0E:AB:01:58:61 >> inet addr:1.128.21.134 Mask:255.128.0.0 >> UP RUNNING NOARP MTU:1500 Metric:1 >> RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:6203028277 (5915.6 Mb) TX bytes:566471561 (540.2 Mb) >> Interrupt:25 >> >>eth2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:829064 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:61216408 (58.3 Mb) TX bytes:19079579 (18.1 Mb) >> Base address:0x2000 Memory:fea80000-feaa0000 >> >>eth2:2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62 >> inet addr:129.66.9.146 Bcast:129.66.9.255 Mask:255.255.255.0 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> Base address:0x2000 Memory:fea80000-feaa0000 >> >>lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:14259 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:879631 (859.0 Kb) TX bytes:879631 (859.0 Kb) >> >>uahrcw@c344-6:~/mpi-benchmarks> ping c344-5 >>PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data. >>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64 >>time=0.067 ms >>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64 >>time=0.037 ms >>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64 >>time=0.022 ms >> >>--- c344-5.x.asc.edu ping statistics --- >>3 packets transmitted, 3 received, 0% packet loss, time 1999ms >>rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms >> >> >> >>George Bosilca wrote: >> >>>I see only 2 possibilities: >>>1. your trying to run Open MPI on nodes having multiple IP >>>addresses. >>>2. your nodes are behind firewalls and Open MPI is unable to pass through. >>> >>>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full >>>answer to your question. >>> >>> Thanks, >>> george. >>> >>>On Thu, 16 Mar 2006, Charles Wright wrote: >>> >>> >>> >>>>Hello, >>>> I'm just compiled open-mpi and tried to run my code which just >>>>measures bandwidth from one node to another. (Code compile fine and >>>>runs under other mpi implementations) >>>> >>>>When I did I got this. >>>> >>>>uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380 >>>>c317-6 >>>>c317-5 >>>>[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>>>connection failed (errno=110) - retrying (pid=24979) >>>>[c317-5:24979] mca_oob_tcp_peer_timer_handler >>>>[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>>>connection failed (errno=110) - retrying (pid=24997) >>>>[c317-5:24997] mca_oob_tcp_peer_timer_handler >>>> >>>>[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] >>>>connect() failed with errno=110 >>>> >>>> >>>>I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has >>>>something to do with it. >>>> >>>>I've attached my config.log >>>> >>>>Any help with this would be appreciated. >>>> >>>>uahrcw@c275-6:~/mpi-benchmarks> ompi_info >>>> Open MPI: 1.0.1r8453 >>>> Open MPI SVN revision: r8453 >>>> Open RTE: 1.0.1r8453 >>>> Open RTE SVN revision: r8453 >>>> OPAL: 1.0.1r8453 >>>> OPAL SVN revision: r8453 >>>> Prefix: /opt/asn/apps/openmpi-1.0.1 >>>>Configured architecture: x86_64-unknown-linux-gnu >>>> Configured by: asnrcw >>>> Configured on: Fri Feb 24 15:19:37 CST 2006 >>>> Configure host: c275-6 >>>> Built by: asnrcw >>>> Built on: Fri Feb 24 15:40:09 CST 2006 >>>> Built host: c275-6 >>>> C bindings: yes >>>> C++ bindings: yes >>>> Fortran77 bindings: yes (all) >>>> Fortran90 bindings: no >>>> C compiler: gcc >>>> C compiler absolute: /usr/bin/gcc >>>> C++ compiler: g++ >>>> C++ compiler absolute: /usr/bin/g++ >>>> Fortran77 compiler: g77 >>>>Fortran77 compiler abs: /usr/bin/g77 >>>> Fortran90 compiler: ifort >>>>Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort >>>> C profiling: yes >>>> C++ profiling: yes >>>> Fortran77 profiling: yes >>>> Fortran90 profiling: no >>>> C++ exceptions: no >>>> Thread support: posix (mpi: no, progress: no) >>>>Internal debug support: no >>>> MPI parameter check: runtime >>>>Memory profiling support: no >>>>Memory debugging support: no >>>> libltdl support: 1 >>>> MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component >>>>v1.0.1) >>>> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) >>>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) >>>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) >>>> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: daemon (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA pls: tm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.1) >>>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.1) >>>>uahrcw@c275-6:~/mpi-benchmarks> >>>> >>>> >>>> >>>> >>>> >>>"We must accept finite disappointment, but we must never lose infinite >>>hope." >>> Martin Luther King >>> >>>_______________________________________________ >>>users mailing list >>>us...@open-mpi.org >>>http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >> >> > >"We must accept finite disappointment, but we must never lose infinite >hope." > Martin Luther King > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users > >
-- Charles Wright, HPC Systems Administrator Alabama Research and Education Network Computer Sciences Corporation