Have you gone to those nodes and checked their IP addresses of -all- 
interfaces? OMPI must be picking up those addresses from somewhere - best guess 
is that those nodes have multiple interfaces on them, some of which are 
configured to those addresses.

Remember: we don't look at the /etc/hosts file where mpirun is executed to get 
the addresses. Processes started on each remote node actually query the 
addresses of all available interfaces on that node. The result is frequently 
different than the address provided in your /etc/hosts file.


On Jul 10, 2011, at 7:45 PM, zhuangchao wrote:

> hello all :
>  
>  
>        I   run  the following command :  
>  
> /data1/cluster/openmpi/bin/mpirun  -d  -machinefile  /tmp/nodes.10515.txt   
> -np  3  /data1/cluster/mpiblast-pio-1.6/bin/mpiblast   -p blastn -i 
> /data1/cluster/sequences/seq_4.txt -d Baculo_Nucleotide -o 
> /data1/cluster/blast.out/blast.out.10515      -g T -m  0 -F F
>  
>       Then  I  get  the following  error  from  openmpi:
>  
> [node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.5 failed: No route to host (113)
> [node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 159.226.126.15 failed: No route to host (113)
>  
>      The  machinefile  is defined  as  following :
>      
>      node1
>      node5
>      node7
>  
>      192.168.0.5  is  not  the  ip  of  hosts in the  machinefile .    
> 159.226.126.15  is  the   another ip of  node1 .  But  hostname node1
>  
> corresponds to   11.11.11.1  in  the /etc/hosts .
>  
>     why   do  I  get  the error ?      Can  you  help me ?
>  
>        Thank you !    
>      
>  
>        
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to