If you you use btl_tcp_if_exclude, you also need to exclude the loopback 
interface.  Loopback is excluded by the default value of btl_tcp_if_exclude, 
but if you overwrite that value, then you need to *also* include the loopback 
interface in the new value.



On Mar 24, 2014, at 4:57 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:

> Hello,
> Still i am facing problems.
> I checked there is no firewall which is acting as a barrier for the mpi 
> communication.
> 
> even i used the execution line like
> hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 --mca btl_tcp_if_exclude br2 
> -host wirth,karp ./a.out
> 
> Now the output hangup without displaying any error.
> 
> Used "..exclude bt2" because the failed connection was from bt2 as you can 
> see in the "ifconfig" output mentioned above.
> 
> I know there is something wrong but i am almost unable to figure it out.
> 
> I need some more kind suggestions. 
> 
> regards.
> 
> 
> On Fri, Mar 21, 2014 at 6:05 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> Do you have any firewalling enabled on these machines?  If so, you'll want to 
> either disable it, or allow random TCP connections between any of the cluster 
> nodes.
> 
> 
> On Mar 21, 2014, at 10:24 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:
> 
> > /sbin/ifconfig
> >
> > hsaeed@karp:~$ /sbin/ifconfig
> > br0       Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
> >           inet addr:134.106.3.231  Bcast:134.106.3.255  Mask:255.255.255.0
> >           inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:49080961 errors:0 dropped:50263 overruns:0 frame:0
> >           TX packets:43279252 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:41348407558 (38.5 GiB)  TX bytes:80505842745 (74.9 GiB)
> >
> > br1       Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
> >           inet addr:134.106.53.231  Bcast:134.106.53.255  Mask:255.255.255.0
> >           inet6 addr: fe80::225:90ff:fe59:c9bb/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:41573060 errors:0 dropped:50261 overruns:0 frame:0
> >           TX packets:1693509 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:6177072160 (5.7 GiB)  TX bytes:230617435 (219.9 MiB)
> >
> > br2       Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
> >           inet addr:10.231.2.231  Bcast:10.231.2.255  Mask:255.255.255.0
> >           UP BROADCAST MULTICAST  MTU:1500  Metric:1
> >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> >
> > eth0      Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:69108377 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:86459066 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:43533091399 (40.5 GiB)  TX bytes:83359370885 (77.6 GiB)
> >           Memory:dfe60000-dfe80000
> >
> > eth1      Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:43531546 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:1716151 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:7201915977 (6.7 GiB)  TX bytes:232026383 (221.2 MiB)
> >           Memory:dfee0000-dff00000
> >
> > lo        Link encap:Local Loopback
> >           inet addr:127.0.0.1  Mask:255.0.0.0
> >           inet6 addr: ::1/128 Scope:Host
> >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >           RX packets:10890707 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:10890707 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:36194379576 (33.7 GiB)  TX bytes:36194379576 (33.7 GiB)
> >
> > tap0      Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
> >           UP BROADCAST MULTICAST  MTU:1500  Metric:1
> >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:500
> >           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> >
> > When i execute the following line
> >
> > hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 -host wirth,karp ./a.out
> >
> > i receive Error
> >
> > [wirth][[59430,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> >  connect() to 10.231.2.231 failed: Connection refused (111)
> >
> >
> > NOTE: Karp and wirth are two machines on ssh cluster.
> >
> >
> >
> >
> > On Fri, Mar 21, 2014 at 3:13 PM, Jeff Squyres (jsquyres) 
> > <jsquy...@cisco.com> wrote:
> > On Mar 21, 2014, at 10:09 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:
> >
> > > > I think i have a tcp connection. As for as i know my cluster is not 
> > > > configured for Infiniband (IB).
> >
> > Ok.
> >
> > > > but even for tcp connections.
> > > >
> > > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> > > > mpirun -n 2 -host master,node001 ./helloworldmpi
> > > >
> > > > These line are not working they output
> > > > Error like
> > > > [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] 
> > > > connect() to xx.xxx.x.xxx failed: Connection refused (111)
> >
> > What are the IP addresses reported by connect()?  (i.e., the address you 
> > X'ed out)
> >
> > Send the output from ifconfig on each of your servers.  Note that some 
> > Linux distributions do not put ifconfig in the default PATH of normal 
> > users; look for it in/sbin/ifconfig or /usr/sbin/ifconfig.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > _______________________________________________
> > Hamid Saeed
> > CoSynth GmbH & Co. KG
> > Escherweg 2 - 26121 Oldenburg - Germany
> > Tel +49 441 9722 738 | Fax -278
> > http://www.cosynth.com
> > _______________________________________________
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> _______________________________________________
> Hamid Saeed
> CoSynth GmbH & Co. KG
> Escherweg 2 - 26121 Oldenburg - Germany
> Tel +49 441 9722 738 | Fax -278
> http://www.cosynth.com
> _______________________________________________
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to