Have you gone to those nodes and checked their IP addresses of -all- interfaces? OMPI must be picking up those addresses from somewhere - best guess is that those nodes have multiple interfaces on them, some of which are configured to those addresses.
Remember: we don't look at the /etc/hosts file where mpirun is executed to get the addresses. Processes started on each remote node actually query the addresses of all available interfaces on that node. The result is frequently different than the address provided in your /etc/hosts file. On Jul 10, 2011, at 7:45 PM, zhuangchao wrote: > hello all : > > > I run the following command : > > /data1/cluster/openmpi/bin/mpirun -d -machinefile /tmp/nodes.10515.txt > -np 3 /data1/cluster/mpiblast-pio-1.6/bin/mpiblast -p blastn -i > /data1/cluster/sequences/seq_4.txt -d Baculo_Nucleotide -o > /data1/cluster/blast.out/blast.out.10515 -g T -m 0 -F F > > Then I get the following error from openmpi: > > [node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.5 failed: No route to host (113) > [node7][[3812,1],2][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > connect() to 159.226.126.15 failed: No route to host (113) > > The machinefile is defined as following : > > node1 > node5 > node7 > > 192.168.0.5 is not the ip of hosts in the machinefile . > 159.226.126.15 is the another ip of node1 . But hostname node1 > > corresponds to 11.11.11.1 in the /etc/hosts . > > why do I get the error ? Can you help me ? > > Thank you ! > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users