By default, Open MPI spawns orted via ssh in a tree fashion. that
basically requires all nodes can ssh to each other.
this is likely not your case (for example slave2 might not be able to
ssh slave4)
as a workaround, can you try to
mpirun --mca plm_rsh_no_tree_spawn 1 ...
and see whether it fixes your problem ?
or you can simply fix name resolution (dns, /etc/hosts, ldap, nis, ...)
on *all* your nodes
Cheers,
Gilles
On 8/16/2016 4:45 PM, Madhuranga Rathnayake wrote:
I have a parallel setup of 6 identical machines with Linux mint 18,
ssh and openmpi.
when i execute this,
mpiexec -np 16 --hostfile mpi-hostfile namd2 apoa1.namd > apoa1.log
with following host file
localhost slots=4
slave1 slots=4
slave2 slots=4
slave3 slots=4
slave4 slots=4
slave5 slots=4
it gives error
ssh: Could not resolve hostname slave3: Temporary failure in name
resolution
ssh: Could not resolve hostname slave4: Temporary failure in name
resolution
ssh: Could not resolve hostname slave5: Temporary failure in name
resolution
and if comment slave3,4,5 and run mpiexec -np 12.... it works fine
if I changed the order, then it runs with first 3 host names.
is there any limitation with openmpi? or any idea to solve this?
--
kind regards,
-Madhuranga Rathnayake | මධුරංග රත්නායක-
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users