You are right. The problem was solved put the entire path of one mpi version:

/home/myuser/openmpi-x/bin/mpirun -hostfile machines -np 2 ./hello

Thanks,

Edson



Em 24-01-2014 16:00, Ralph Castain escreveu:
Looks to me like you are picking up a different OMPI installation on
the remote node - check that your path and ld_library_path on the
remote host are being set correctly
On Jan 24, 2014, at 9:41 AM, etcamargo <etcama...@inf.ufpr.br> wrote:

Hi, All!

Please, I have a problem to run a simple "hello world" program on different hosts. The hosts are virtual machines located in the same net. The program works fine only on one host, the ssh is ok between the machines and nfs is ok, sharing the executable files between the machines.

a) $ mpirun -hostfile machines -v -np 2 ./hello

[achel:15275] [[32727,0],0] ORTE_ERROR_LOG: Out of resource in file base/plm_base_launch_support.c at line 482
[latrappe:16467] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:16467] [[32727,0],1] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/orted/orted_comm.c at line 235 [latrappe:16467] [[32727,0],1] routed:binomial: Connection to lifeline [[32727,0],0] lost


b) $ mpirun -mca plm_base_verbose 5 -hostfile machines -v -np 2 ./hello

[achel:17020] mca:base:select:(  plm) Querying component [rsh]
[achel:17020] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [achel:17020] mca:base:select:( plm) Query of component [rsh] set priority to 10
[achel:17020] mca:base:select:(  plm) Querying component [slurm]
[achel:17020] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module
[achel:17020] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] plm:base:set_hnp_name: initial bias 17020 nodename hash 2714559920
[achel:17020] plm:base:set_hnp_name: final jobfam 1536
[achel:17020] [[1536,0],0] plm:rsh_setup on agent ssh : rsh path NULL
[achel:17020] [[1536,0],0] plm:base:receive start comm
[achel:17020] released to spawn
[achel:17020] [[1536,0],0] plm:base:setup_vm
[achel:17020] [[1536,0],0] plm:base:setup_vm creating map
[achel:17020] [[1536,0],0] plm:base:setup_vm add new daemon [[1536,0],1] [achel:17020] [[1536,0],0] plm:base:setup_vm assigning new daemon [[1536,0],1] to node latrappe.c3local
[achel:17020] [[1536,0],0] plm:rsh: launching vm
[achel:17020] [[1536,0],0] plm:rsh: local shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: assuming same remote shell as local shell
[achel:17020] [[1536,0],0] plm:rsh: remote shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: final template argv:
/usr/bin/ssh <template> orted -mca ess env -mca orte_ess_jobid 100663296 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 2 -mca orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca plm_base_verbose 5 -mca plm rsh
[achel:17020] [[1536,0],0] plm:rsh: launching on node latrappe.c3local
[achel:17020] [[1536,0],0] plm:rsh: recording launch of daemon [[1536,0],1]
[achel:17020] [[1536,0],0] plm:base:daemon_callback
[achel:17020] [[1536,0],0] plm:rsh: executing: (//usr/bin/ssh) [/usr/bin/ssh latrappe.c3local orted -mca ess env -mca orte_ess_jobid 100663296 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca plm_base_verbose 5 -mca plm rsh]
[latrappe:18212] mca:base:select:(  plm) Querying component [rsh]
[latrappe:18212] mca:base:select:( plm) Query of component [rsh] set priority to 10
[latrappe:18212] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] [[1536,0],0] plm:base:orted_report_launch from daemon [[1536,0],1] via [[1536,0],1] [achel:17020] [[1536,0],0] ORTE_ERROR_LOG: Out of resource in file base/plm_base_launch_support.c at line 482 [achel:17020] [[1536,0],0] plm:base:orted_report_launch failed for daemon [[1536,0],1] (via [[1536,0],1]) at contact 100663296.1;tcp://10.254.222.7:33825 [achel:17020] [[1536,0],0] plm:base:orted_cmd sending orted_exit commands [achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit abnormal term ordered [achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit sending cmd to [[1536,0],1] [achel:17020] [[1536,0],0] plm:base:orted_cmd message to [[1536,0],1] sent
[achel:17020] [[1536,0],0] plm:base:orted_cmd all messages sent
[achel:17020] [[1536,0],0] plm:tm: daemon launch failed on error (null)
[latrappe:18212] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:18212] [[1536,0],1] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/orted/orted_comm.c at line 235
[achel:17020] [[1536,0],0] plm:base:receive stop comm
[latrappe:18212] [[1536,0],1] routed:binomial: Connection to lifeline [[1536,0],0] lost

Thanks in advance,

Edson

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to