I'm trying to execute this command:
*mpirun -np 8 --host openmpi@10.10.1.1 <openmpi@10.10.1.1>,openmpi@10.10.1.2 <openmpi@10.10.1.2>,openmpi@10.10.1.3 <openmpi@10.10.1.3>,openmpi@10.10.1.4 <openmpi@10.10.1.4> --mca oob_tcp_if_exclude lo,wlp2s0 ompi_info* Everything goes find if I execute the same command with only 2 nodes (independently of which nodes). With 3 or more nodes I obtain: *ssh: connect to host 10 port 22: Invalid argument* followed by "ORTE was unable to reliably start one or more daemons." error. Searching with plm_base_verbose, I found: ... [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon [[53718,0],1] [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon [[53718,0],1] to node openmpi@10.10.1.1 [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon [[53718,0],2] [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon [[53718,0],2] to node openmpi@10.10.1.2 [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon [[53718,0],3] [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon [[53718,0],3] to node openmpi@10.10.1.3 [Neptune:22627] [[53718,0],0] plm:base:setup_vm add new daemon [[53718,0],4] [Neptune:22627] [[53718,0],0] plm:base:setup_vm assigning new daemon [[53718,0],4] to node openmpi@10.10.1.4 ... [Neptune:22627] [[53718,0],0] plm:rsh:launch daemon 0 not a child of mine [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.1 to launch list [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.2 to launch list [Neptune:22627] [[53718,0],0] plm:rsh:launch daemon 3 not a child of mine [Neptune:22627] [[53718,0],0] plm:rsh: adding node openmpi@10.10.1.4 to launch list ... [roaster-vm1:00593] [[53718,0],1] plm:rsh: remote spawn called [roaster-vm1:00593] [[53718,0],1] plm:rsh: local shell: 0 (bash) [roaster-vm1:00593] [[53718,0],1] plm:rsh: assuming same remote shell as local shell [roaster-vm1:00593] [[53718,0],1] plm:rsh: remote shell: 0 (bash) [roaster-vm1:00593] [[53718,0],1] plm:rsh: final template argv: /usr/bin/ssh <template> orted --hnp-topo-sig 0N:1S:0L3:1L2:2L1:2C:2H:x86_64 -mca ess "env" -mca orte_ess_jobid "3520462848" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "5" -mca orte_parent_uri "3520462848.1;tcp://10.10.1.1:35489" -mca orte_hnp_uri "3520462848.0;tcp://10.10.10.2:43771" --mca oob_tcp_if_exclude "lo,wlp2s0" --mca plm_base_verbose "100" -mca plm "rsh" --tree-spawn [roaster-vm1:00593] [[53718,0],1] plm:rsh: activating launch event [roaster-vm1:00593] [[53718,0],1] plm:rsh: recording launch of daemon [[53718,0],3] [roaster-vm1:00593] [[53718,0],1] plm:rsh: executing: (/usr/bin/ssh) [*/usr/bin/ssh openmpi@10 orted* --hnp-topo-sig 0N:1S:0L3:1L2:2L1:2C:2H:x86_64 -mca ess "env" -mca orte_ess_jobid "3520462848" -mca orte_ess_vpid 3 -mca orte_ess_num_procs "5" -mca orte_parent_uri "3520462848.1;tcp:// 10.10.1.1:35489" -mca orte_hnp_uri "3520462848.0;tcp://10.10.10.2:43771" --mca oob_tcp_if_exclude "lo,wlp2s0" --mca plm_base_verbose "100" -mca plm "rsh" --tree-spawn] *ssh: connect to host 10 port 22: Invalid argument* It seems it corrupts the ip address during remote spawn. Any idea? (I'm using 1.10.0rc7 version) Cheers, Federico __ Federico Reghenzani M.Eng. Student @ Politecnico di Milano Computer Science and Engineering