Looks to me like you are picking up a different OMPI installation on the remote 
node - check that your path and ld_library_path on the remote host are being 
set correctly
On Jan 24, 2014, at 9:41 AM, etcamargo <etcama...@inf.ufpr.br> wrote:

> Hi, All!
> 
> Please, I have a problem to run a simple "hello world" program on different 
> hosts. The hosts are virtual machines located in the same net. The program 
> works fine only on one host, the ssh is ok between the machines and nfs is 
> ok, sharing the  executable files between the machines.
> 
> a) $ mpirun -hostfile machines -v -np 2 ./hello
> 
> [achel:15275] [[32727,0],0] ORTE_ERROR_LOG: Out of resource in file 
> base/plm_base_launch_support.c at line 482
> [latrappe:16467] OPAL dss:unpack: got type 49 when expecting type 38
> [latrappe:16467] [[32727,0],1] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../orte/orted/orted_comm.c at line 235
> [latrappe:16467] [[32727,0],1] routed:binomial: Connection to lifeline 
> [[32727,0],0] lost
> 
> 
> b) $ mpirun -mca plm_base_verbose 5 -hostfile machines -v -np 2 ./hello
> 
> [achel:17020] mca:base:select:(  plm) Querying component [rsh]
> [achel:17020] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
> [achel:17020] mca:base:select:(  plm) Query of component [rsh] set priority 
> to 10
> [achel:17020] mca:base:select:(  plm) Querying component [slurm]
> [achel:17020] mca:base:select:(  plm) Skipping component [slurm]. Query 
> failed to return a module
> [achel:17020] mca:base:select:(  plm) Selected component [rsh]
> [achel:17020] plm:base:set_hnp_name: initial bias 17020 nodename hash 
> 2714559920
> [achel:17020] plm:base:set_hnp_name: final jobfam 1536
> [achel:17020] [[1536,0],0] plm:rsh_setup on agent ssh : rsh path NULL
> [achel:17020] [[1536,0],0] plm:base:receive start comm
> [achel:17020] released to spawn
> [achel:17020] [[1536,0],0] plm:base:setup_vm
> [achel:17020] [[1536,0],0] plm:base:setup_vm creating map
> [achel:17020] [[1536,0],0] plm:base:setup_vm add new daemon [[1536,0],1]
> [achel:17020] [[1536,0],0] plm:base:setup_vm assigning new daemon 
> [[1536,0],1] to node latrappe.c3local
> [achel:17020] [[1536,0],0] plm:rsh: launching vm
> [achel:17020] [[1536,0],0] plm:rsh: local shell: 0 (bash)
> [achel:17020] [[1536,0],0] plm:rsh: assuming same remote shell as local shell
> [achel:17020] [[1536,0],0] plm:rsh: remote shell: 0 (bash)
> [achel:17020] [[1536,0],0] plm:rsh: final template argv:
>       /usr/bin/ssh <template>  orted -mca ess env -mca orte_ess_jobid 
> 100663296 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 2 -mca 
> orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca plm_base_verbose 5 
> -mca plm rsh
> [achel:17020] [[1536,0],0] plm:rsh: launching on node latrappe.c3local
> [achel:17020] [[1536,0],0] plm:rsh: recording launch of daemon [[1536,0],1]
> [achel:17020] [[1536,0],0] plm:base:daemon_callback
> [achel:17020] [[1536,0],0] plm:rsh: executing: (//usr/bin/ssh) [/usr/bin/ssh 
> latrappe.c3local  orted -mca ess env -mca orte_ess_jobid 100663296 -mca 
> orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri 
> "100663296.0;tcp://10.254.222.5:37564" -mca plm_base_verbose 5 -mca plm rsh]
> [latrappe:18212] mca:base:select:(  plm) Querying component [rsh]
> [latrappe:18212] mca:base:select:(  plm) Query of component [rsh] set 
> priority to 10
> [latrappe:18212] mca:base:select:(  plm) Selected component [rsh]
> [achel:17020] [[1536,0],0] plm:base:orted_report_launch from daemon 
> [[1536,0],1] via [[1536,0],1]
> [achel:17020] [[1536,0],0] ORTE_ERROR_LOG: Out of resource in file 
> base/plm_base_launch_support.c at line 482
> [achel:17020] [[1536,0],0] plm:base:orted_report_launch failed for daemon 
> [[1536,0],1] (via [[1536,0],1]) at contact 
> 100663296.1;tcp://10.254.222.7:33825
> [achel:17020] [[1536,0],0] plm:base:orted_cmd sending orted_exit commands
> [achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit abnormal term ordered
> [achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit sending cmd to 
> [[1536,0],1]
> [achel:17020] [[1536,0],0] plm:base:orted_cmd message to [[1536,0],1] sent
> [achel:17020] [[1536,0],0] plm:base:orted_cmd all messages sent
> [achel:17020] [[1536,0],0] plm:tm: daemon launch failed on error (null)
> [latrappe:18212] OPAL dss:unpack: got type 49 when expecting type 38
> [latrappe:18212] [[1536,0],1] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../orte/orted/orted_comm.c at line 235
> [achel:17020] [[1536,0],0] plm:base:receive stop comm
> [latrappe:18212] [[1536,0],1] routed:binomial: Connection to lifeline 
> [[1536,0],0] lost
> 
> Thanks in advance,
> 
> Edson
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to