Do you have any firewalling enabled on these machines?  If so, you'll want to 
either disable it, or allow random TCP connections between any of the cluster 
nodes.


On Mar 21, 2014, at 10:24 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:

> /sbin/ifconfig
> 
> hsaeed@karp:~$ /sbin/ifconfig
> br0       Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
>           inet addr:134.106.3.231  Bcast:134.106.3.255  Mask:255.255.255.0
>           inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:49080961 errors:0 dropped:50263 overruns:0 frame:0
>           TX packets:43279252 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:41348407558 (38.5 GiB)  TX bytes:80505842745 (74.9 GiB)
> 
> br1       Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
>           inet addr:134.106.53.231  Bcast:134.106.53.255  Mask:255.255.255.0
>           inet6 addr: fe80::225:90ff:fe59:c9bb/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:41573060 errors:0 dropped:50261 overruns:0 frame:0
>           TX packets:1693509 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:6177072160 (5.7 GiB)  TX bytes:230617435 (219.9 MiB)
> 
> br2       Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
>           inet addr:10.231.2.231  Bcast:10.231.2.255  Mask:255.255.255.0
>           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> 
> eth0      Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:69108377 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:86459066 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:43533091399 (40.5 GiB)  TX bytes:83359370885 (77.6 GiB)
>           Memory:dfe60000-dfe80000
> 
> eth1      Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:43531546 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1716151 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:7201915977 (6.7 GiB)  TX bytes:232026383 (221.2 MiB)
>           Memory:dfee0000-dff00000
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:10890707 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:10890707 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:36194379576 (33.7 GiB)  TX bytes:36194379576 (33.7 GiB)
> 
> tap0      Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
>           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:500
>           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
> 
> When i execute the following line 
> 
> hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 -host wirth,karp ./a.out
> 
> i receive Error
> 
> [wirth][[59430,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 10.231.2.231 failed: Connection refused (111)
> 
> 
> NOTE: Karp and wirth are two machines on ssh cluster.
> 
> 
> 
> 
> On Fri, Mar 21, 2014 at 3:13 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> On Mar 21, 2014, at 10:09 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:
> 
> > > I think i have a tcp connection. As for as i know my cluster is not 
> > > configured for Infiniband (IB).
> 
> Ok.
> 
> > > but even for tcp connections.
> > >
> > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> > > mpirun -n 2 -host master,node001 ./helloworldmpi
> > >
> > > These line are not working they output
> > > Error like
> > > [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() 
> > > to xx.xxx.x.xxx failed: Connection refused (111)
> 
> What are the IP addresses reported by connect()?  (i.e., the address you X'ed 
> out)
> 
> Send the output from ifconfig on each of your servers.  Note that some Linux 
> distributions do not put ifconfig in the default PATH of normal users; look 
> for it in/sbin/ifconfig or /usr/sbin/ifconfig.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> _______________________________________________
> Hamid Saeed
> CoSynth GmbH & Co. KG
> Escherweg 2 - 26121 Oldenburg - Germany
> Tel +49 441 9722 738 | Fax -278
> http://www.cosynth.com
> _______________________________________________
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to