Do you have firewalling enabled on either server?

See this FAQ item:

    http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems



On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote:

> Dear All
> 
> I need your advice. While trying to run mpirun job across nodes I get
> following error. It seems that the two nodes i.e, compute-01-01 and
> compute-01-06 are not able to communicate with each other. While nodes
> see each other on ping.
> 
> [pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostlist --mca btl
> ^openib ../bin/regcmMPICLM45 regcm.in
> 
> [compute-01-06.private.dns.zone][[48897,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.14 failed: No route to host (113)
> [compute-01-06.private.dns.zone][[48897,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.14 failed: No route to host (113)
> [compute-01-06.private.dns.zone][[48897,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.14 failed: No route to host (113)
> [compute-01-01.private.dns.zone][[48897,1],10][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> [compute-01-01.private.dns.zone][[48897,1],12][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> [compute-01-01.private.dns.zone][[48897,1],14][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.108.10 failed: No route to host (113)
> connect() to 192.168.108.10 failed: No route to host (113)
> 
> mpirun: killing job...
> 
> [pmdtest@pmd ERA_CLM45]$ ssh compute-01-01
> Last login: Wed Nov 12 09:48:53 2014 from pmd-eth0.private.dns.zone
> [pmdtest@compute-01-01 ~]$ ping compute-01-06
> PING compute-01-06.private.dns.zone (10.0.0.8) 56(84) bytes of data.
> 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=1
> ttl=64 time=0.108 ms
> 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=2
> ttl=64 time=0.088 ms
> 
> --- compute-01-06.private.dns.zone ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
> rtt min/avg/max/mdev = 0.088/0.098/0.108/0.010 ms
> [pmdtest@compute-01-01 ~]$
> 
> Thanks in advance.
> 
> Ahsan
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25761.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to