By default, Open MPI spawns orted via ssh in a tree fashion. that basically requires all nodes can ssh to each other.

this is likely not your case (for example slave2 might not be able to ssh slave4)


as a workaround, can you try to

mpirun --mca plm_rsh_no_tree_spawn 1 ...

and see whether it fixes your problem ?


or you can simply fix name resolution (dns, /etc/hosts, ldap, nis, ...) on *all* your nodes


Cheers,


Gilles


On 8/16/2016 4:45 PM, Madhuranga Rathnayake wrote:
I have a parallel setup of 6 identical machines with Linux mint 18, ssh and openmpi.

when i execute this,
mpiexec -np 16 --hostfile mpi-hostfile namd2 apoa1.namd > apoa1.log
with following host file
localhost slots=4
slave1 slots=4
slave2 slots=4
slave3 slots=4
slave4 slots=4
slave5 slots=4

it gives error
ssh: Could not resolve hostname slave3: Temporary failure in name resolution ssh: Could not resolve hostname slave4: Temporary failure in name resolution ssh: Could not resolve hostname slave5: Temporary failure in name resolution

and if comment slave3,4,5 and run mpiexec -np 12.... it works fine

if I changed the order, then it runs with first 3 host names.

is there any limitation with openmpi? or any idea to solve this?

--
kind regards,
-Madhuranga Rathnayake | මධුරංග රත්නායක-


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to