I can't check it this week due to the Supercomputing project. It looks like you are feeding us a hostfile that contains userid and a hostname expressed as an IP address. Can you convert the IP address to a name? I think that might be a workaround until I can address it.
On Tue, Nov 17, 2015 at 4:19 AM, Federico Reghenzani < federico1.reghenz...@mail.polimi.it> wrote: > I'm trying to execute this command: > > > *mpirun -np 8 --host openmpi@10.10.1.1 > <openmpi@10.10.1.1>,openmpi@10.10.1.2 <openmpi@10.10.1.2>,openmpi@10.10.1.3 > <openmpi@10.10.1.3>,openmpi@10.10.1.4 <openmpi@10.10.1.4> --mca > oob_tcp_if_exclude lo,wlp2s0 ompi_info* > > Everything goes find if I execute the same command with only 2 nodes > (independently of which nodes). > > With 3 or more nodes I obtain: > *ssh: connect to host 10 port 22: Invalid argument* > followed by "ORTE was unable to reliably start one or more daemons." error. > > Searching with plm_base_verbose, I found: > > ... > [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon > [[53718,0],1] > [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon > [[53718,0],1] to node openmpi@10.10.1.1 > [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon > [[53718,0],2] > [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon > [[53718,0],2] to node openmpi@10.10.1.2 > [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon > [[53718,0],3] > [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon > [[53718,0],3] to node openmpi@10.10.1.3 > [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon > [[53718,0],4] > [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon > [[53718,0],4] to node openmpi@10.10.1.4 > ... > [Neptune:22627] [[53718,0],0] plm:rsh:launch daemon 0 not a child of mine > [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.1 to > launch list > [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.2 to > launch list > [Neptune:22627] [[53718,0],0] plm:rsh:launch daemon 3 not a child of mine > [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.4 to > launch list > ... > [roaster-vm1:00593] [[53718,0],1] plm:rsh: remote spawn called > [roaster-vm1:00593] [[53718,0],1] plm:rsh: local shell: 0 (bash) > [roaster-vm1:00593] [[53718,0],1] plm:rsh: assuming same remote shell as > local shell > [roaster-vm1:00593] [[53718,0],1] plm:rsh: remote shell: 0 (bash) > [roaster-vm1:00593] [[53718,0],1] plm:rsh: final template argv: > /usr/bin/ssh <template> orted --hnp-topo-sig > 0N:1S:0L3:1L2:2L1:2C:2H:x86_64 -mca ess "env" -mca orte_ess_jobid > "3520462848" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "5" > -mca orte_parent_uri "3520462848.1;tcp://10.10.1.1:35489" -mca > orte_hnp_uri "3520462848.0;tcp://10.10.10.2:43771" --mca > oob_tcp_if_exclude "lo,wlp2s0" --mca plm_base_verbose "100" -mca plm "rsh" > --tree-spawn > [roaster-vm1:00593] [[53718,0],1] plm:rsh: activating launch event > [roaster-vm1:00593] [[53718,0],1] plm:rsh: recording launch of daemon > [[53718,0],3] > [roaster-vm1:00593] [[53718,0],1] plm:rsh: executing: (/usr/bin/ssh) > [*/usr/bin/ssh > openmpi@10 orted* --hnp-topo-sig 0N:1S:0L3:1L2:2L1:2C:2H:x86_64 -mca ess > "env" -mca orte_ess_jobid "3520462848" -mca orte_ess_vpid 3 -mca > orte_ess_num_procs "5" -mca orte_parent_uri "3520462848.1;tcp:// > 10.10.1.1:35489" -mca orte_hnp_uri "3520462848.0;tcp://10.10.10.2:43771" > --mca oob_tcp_if_exclude "lo,wlp2s0" --mca plm_base_verbose "100" -mca plm > "rsh" --tree-spawn] > *ssh: connect to host 10 port 22: Invalid argument* > > It seems it corrupts the ip address during remote spawn. Any idea? > > (I'm using 1.10.0rc7 version) > > > Cheers, > Federico > > __ > Federico Reghenzani > M.Eng. Student @ Politecnico di Milano > Computer Science and Engineering > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/11/28042.php >