Dear All I need your advice. While trying to run mpirun job across nodes I get following error. It seems that the two nodes i.e, compute-01-01 and compute-01-06 are not able to communicate with each other. While nodes see each other on ping.
[pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostlist --mca btl ^openib ../bin/regcmMPICLM45 regcm.in [compute-01-06.private.dns.zone][[48897,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.108.14 failed: No route to host (113) [compute-01-06.private.dns.zone][[48897,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.108.14 failed: No route to host (113) [compute-01-06.private.dns.zone][[48897,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.108.14 failed: No route to host (113) [compute-01-01.private.dns.zone][[48897,1],10][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] [compute-01-01.private.dns.zone][[48897,1],12][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.108.10 failed: No route to host (113) [compute-01-01.private.dns.zone][[48897,1],14][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.108.10 failed: No route to host (113) connect() to 192.168.108.10 failed: No route to host (113) mpirun: killing job... [pmdtest@pmd ERA_CLM45]$ ssh compute-01-01 Last login: Wed Nov 12 09:48:53 2014 from pmd-eth0.private.dns.zone [pmdtest@compute-01-01 ~]$ ping compute-01-06 PING compute-01-06.private.dns.zone (10.0.0.8) 56(84) bytes of data. 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=1 ttl=64 time=0.108 ms 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=2 ttl=64 time=0.088 ms --- compute-01-06.private.dns.zone ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.088/0.098/0.108/0.010 ms [pmdtest@compute-01-01 ~]$ Thanks in advance. Ahsan