That works!!
Thanks!!

George Bosilca wrote:
>Sorry I wasn't clear enough on my previous post. The error messages that 
>you get are comming from the OOB which is the framework we're using to 
>setup the MPI run. The options that you use (btl_tcp_if_include) are only 
>used for MPI communications. Please add "--mca oob_tcp_include eth0" to 
>force the OOB framework to use eth0. In order to don't have to type all 
>these options all the time you can add them in the 
>$(HOME).openmpi/mca-params.conf file. A file containing:
>
>oob_tcp_include=eth1
>btl_tcp_if_include=eth1
>
>should solve your problems, if the firewall is opened on eth1 between 
>these nodes.
>
>   Thanks,
>     george.
>
>On Thu, 16 Mar 2006, Charles Wright wrote:
>
>  
>>Thanks for the tip.
>>
>>I see that both number 1 and 2 are true.
>>Openmpi is insisting on using my eth0 (I know this by watching the
>>firewall log on the node it is trying to go to)
>>
>>This is despite the fact that I have the first dns entry go to eth1,
>>normally that is all pbs would need to do the right thing and use the
>>network I prefer.
>>
>>Ok so I see there are some options to in/exclude interfaces.
>>
>>however mpiexec is igorning my requests.
>>I tried it two ways.  Neither worked.   Firewall rejects traffic coming
>>into 1.0.x.x. network in both cases.
>>
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>
>>(see dns works... not over eth0)
>>uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig
>>eth0      Link encap:Ethernet  HWaddr 00:0E:AB:01:58:60
>>         inet addr:1.0.21.134  Bcast:1.127.255.255  Mask:255.128.0.0
>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>         RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
>>         TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
>>         collisions:0 txqueuelen:1000
>>         RX bytes:560395541 (534.4 Mb)  TX bytes:34367848 (32.7 Mb)
>>         Interrupt:16
>>
>>eth1      Link encap:Ethernet  HWaddr 00:0E:AB:01:58:61
>>         inet addr:1.128.21.134  Mask:255.128.0.0
>>         UP RUNNING NOARP  MTU:1500  Metric:1
>>         RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
>>         TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
>>         collisions:0 txqueuelen:1000
>>         RX bytes:6203028277 (5915.6 Mb)  TX bytes:566471561 (540.2 Mb)
>>         Interrupt:25
>>
>>eth2      Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>         RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
>>         TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
>>         collisions:0 txqueuelen:1000
>>         RX bytes:61216408 (58.3 Mb)  TX bytes:19079579 (18.1 Mb)
>>         Base address:0x2000 Memory:fea80000-feaa0000
>>
>>eth2:2    Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>>         inet addr:129.66.9.146  Bcast:129.66.9.255  Mask:255.255.255.0
>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>         Base address:0x2000 Memory:fea80000-feaa0000
>>
>>lo        Link encap:Local Loopback
>>         inet addr:127.0.0.1  Mask:255.0.0.0
>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>         RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
>>         TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
>>         collisions:0 txqueuelen:0
>>         RX bytes:879631 (859.0 Kb)  TX bytes:879631 (859.0 Kb)
>>
>>uahrcw@c344-6:~/mpi-benchmarks> ping c344-5
>>PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
>>time=0.067 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
>>time=0.037 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
>>time=0.022 ms
>>
>>--- c344-5.x.asc.edu ping statistics ---
>>3 packets transmitted, 3 received, 0% packet loss, time 1999ms
>>rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms
>>
>>
>>
>>George Bosilca wrote:
>>    
>>>I see only 2 possibilities:
>>>1. your trying to run Open MPI on nodes having multiple IP
>>>addresses.
>>>2. your nodes are behind firewalls and Open MPI is unable to pass through.
>>>
>>>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full
>>>answer to your question.
>>>
>>>  Thanks,
>>>    george.
>>>
>>>On Thu, 16 Mar 2006, Charles Wright wrote:
>>>
>>>
>>>      
>>>>Hello,
>>>>  I'm just compiled open-mpi and tried to run my code which just
>>>>measures bandwidth from one node to another.   (Code compile fine and
>>>>runs under other mpi implementations)
>>>>
>>>>When I did I got this.
>>>>
>>>>uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
>>>>c317-6
>>>>c317-5
>>>>[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>>connection failed (errno=110) - retrying (pid=24979)
>>>>[c317-5:24979] mca_oob_tcp_peer_timer_handler
>>>>[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>>connection failed (errno=110) - retrying (pid=24997)
>>>>[c317-5:24997] mca_oob_tcp_peer_timer_handler
>>>>
>>>>[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
>>>>connect() failed with errno=110
>>>>
>>>>
>>>>I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
>>>>something to do with it.
>>>>
>>>>I've attached my config.log
>>>>
>>>>Any help with this would be appreciated.
>>>>
>>>>uahrcw@c275-6:~/mpi-benchmarks> ompi_info
>>>>              Open MPI: 1.0.1r8453
>>>> Open MPI SVN revision: r8453
>>>>              Open RTE: 1.0.1r8453
>>>> Open RTE SVN revision: r8453
>>>>                  OPAL: 1.0.1r8453
>>>>     OPAL SVN revision: r8453
>>>>                Prefix: /opt/asn/apps/openmpi-1.0.1
>>>>Configured architecture: x86_64-unknown-linux-gnu
>>>>         Configured by: asnrcw
>>>>         Configured on: Fri Feb 24 15:19:37 CST 2006
>>>>        Configure host: c275-6
>>>>              Built by: asnrcw
>>>>              Built on: Fri Feb 24 15:40:09 CST 2006
>>>>            Built host: c275-6
>>>>            C bindings: yes
>>>>          C++ bindings: yes
>>>>    Fortran77 bindings: yes (all)
>>>>    Fortran90 bindings: no
>>>>            C compiler: gcc
>>>>   C compiler absolute: /usr/bin/gcc
>>>>          C++ compiler: g++
>>>> C++ compiler absolute: /usr/bin/g++
>>>>    Fortran77 compiler: g77
>>>>Fortran77 compiler abs: /usr/bin/g77
>>>>    Fortran90 compiler: ifort
>>>>Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort
>>>>           C profiling: yes
>>>>         C++ profiling: yes
>>>>   Fortran77 profiling: yes
>>>>   Fortran90 profiling: no
>>>>        C++ exceptions: no
>>>>        Thread support: posix (mpi: no, progress: no)
>>>>Internal debug support: no
>>>>   MPI parameter check: runtime
>>>>Memory profiling support: no
>>>>Memory debugging support: no
>>>>       libltdl support: 1
>>>>            MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component
>>>>v1.0.1)
>>>>         MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1)
>>>>         MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1)
>>>>         MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1)
>>>>             MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1)
>>>>         MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>>>         MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>>>>              MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1)
>>>>              MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>>              MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>                MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1)
>>>>             MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>>              MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1)
>>>>                MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>>                MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>>               MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>>             MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1)
>>>>              MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>>              MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: daemon (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA pls: tm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA sds: env (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>>               MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.1)
>>>>uahrcw@c275-6:~/mpi-benchmarks>
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>"We must accept finite disappointment, but we must never lose infinite
>>>hope."
>>>                                  Martin Luther King
>>>
>>>_______________________________________________
>>>users mailing list
>>>us...@open-mpi.org
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>      
>>
>>    
>
>"We must accept finite disappointment, but we must never lose infinite
>hope."
>                                   Martin Luther King
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>  


-- 
Charles Wright, HPC Systems Administrator
Alabama Research and Education Network
Computer Sciences Corporation 

Reply via email to