Is your IB card in compute-01-10.private.dns.zone working?
Did you check it with ibstat?

Do you have a dual port IB card in compute-01-15.private.dns.zone?
Did you connect both ports to the same switch on the same subnet?

TCP "no route to host":
If it is not a firewall problem, could it bad Ethernet port on a node perhaps?

Also, if you use host names in your hostfile, I guess they need to be able to
resolve the names into IP addresses.
Check if your /etc/hosts file, DNS server, or whatever you
use for name resolution, is correct and consistent across the cluster.

On Jan 19, 2014, at 10:18 PM, Syed Ahsan Ali wrote:

> I agree with you and still struglling with subnet ID settings because I 
> couldn't find /var/cache/opensm/opensm.opts file.
>  
> Secondly, if OMPI is going for TCP then it should be able to find as compute 
> nodes are available via ping and ssh
> 
> 
> On Sun, Jan 19, 2014 at 9:38 PM, Ralph Castain <r...@open-mpi.org> wrote:
> If OMPI finds infiniband support on the node, it will attempt to use it. In 
> this case, it would appear you have an incorrectly configured IB adaptor on 
> the node, so you get the additional warning about that fact.
> 
> OMPI then falls back to look for another transport, in this case TCP. 
> However, the TCP transport is unable to create a socket to the remote host. 
> The most likely cause is a firewall, so you might want to check that and turn 
> it off.
> 
> 
> On Jan 19, 2014, at 4:19 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote:
> 
>> Dear All
>>  
>> I am getting infiniband errors while running mpirun applications on cluster. 
>> I get these errors even when I don't include infiniband usage flags in 
>> mpirun command. Please guide
>>  
>> mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in
>>  
>> --------------------------------------------------------------------------
>> [[59183,1],24]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>> Module: OpenFabrics (openib)
>>   Host: compute-01-10.private.dns.zone
>> 
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> WARNING: There are more than one active ports on host 
>> 'compute-01-15.private.dns.zone', but the
>> default subnet GID prefix was detected on more than one of these
>> ports.  If these ports are connected to different physical IB
>> networks, this configuration will fail in Open MPI.  This version of
>> Open MPI requires that every physically separate IB subnet that is
>> used between connected MPI processes must have different subnet ID
>> values.
>> 
>> Please see this FAQ entry for more details:
>> 
>>   http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>> 
>> NOTE: You can turn off this warning by setting the MCA parameter
>>       btl_openib_warn_default_gid_prefix to 0.
>> --------------------------------------------------------------------------
>> 
>>   This is RegCM trunk
>>    SVN Revision: tag 4.3.5.6 compiled at: data : Sep  3 2013  time: 05:10:53
>> 
>> [pmd.pakmet.com:03309] 15 more processes have sent help message 
>> help-mpi-btl-base.txt / btl:no-nics
>> [pmd.pakmet.com:03309] Set MCA parameter "orte_base_help_aggregate" to 0 to 
>> see all help / error messages
>> [pmd.pakmet.com:03309] 47 more processes have sent help message 
>> help-mpi-btl-openib.txt / default subnet prefix
>> [compute-01-03.private.dns.zone][[59183,1],1][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],3][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  
>> [compute-01-03.private.dns.zone][[59183,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-03.private.dns.zone][[59183,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.108.10 failed: No route to host (113)
>> 
>> Ahsan
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Syed Ahsan Ali Bokhari 
> Electronic Engineer (EE)
> 
> Research & Development Division
> Pakistan Meteorological Department H-8/4, Islamabad.
> Phone # off  +92518358714
> Cell # +923155145014
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to