[OMPI users] about using mpi-thread-multiple

2014-09-12 Thread etcamargo

Hi,

I would like to know what is the mpi version recomended for running 
multiple mpi call per process, i.e., MPI_THREAD_MULTIPLE in 
MPI_Init_thread();



Thanks,

Edson


[OMPI users] monitoring the status of processors

2015-03-17 Thread etcamargo

Hi, All

I would like to know if there is a (MPI) tool for monitoring the status 
of a processor (and your cores) at runtime, i.e., while I am running a 
MPI application.


Let's suppose that some physical processors become overloaded while a 
MPI application is running. I am looking for a way to know which are the 
"busy" or the "slow" processors.


Thanks in advance!

Edson


[OMPI users] Connection to lifeline lost

2014-01-24 Thread etcamargo

Hi, All!

Please, I have a problem to run a simple "hello world" program on 
different hosts. The hosts are virtual machines located in the same net. 
The program works fine only on one host, the ssh is ok between the 
machines and nfs is ok, sharing the  executable files between the 
machines.


a) $ mpirun -hostfile machines -v -np 2 ./hello

[achel:15275] [[32727,0],0] ORTE_ERROR_LOG: Out of resource in file 
base/plm_base_launch_support.c at line 482

[latrappe:16467] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:16467] [[32727,0],1] ORTE_ERROR_LOG: Pack data mismatch in 
file ../../../orte/orted/orted_comm.c at line 235
[latrappe:16467] [[32727,0],1] routed:binomial: Connection to lifeline 
[[32727,0],0] lost



b) $ mpirun -mca plm_base_verbose 5 -hostfile machines -v -np 2 ./hello

[achel:17020] mca:base:select:(  plm) Querying component [rsh]
[achel:17020] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path 
NULL
[achel:17020] mca:base:select:(  plm) Query of component [rsh] set 
priority to 10

[achel:17020] mca:base:select:(  plm) Querying component [slurm]
[achel:17020] mca:base:select:(  plm) Skipping component [slurm]. Query 
failed to return a module

[achel:17020] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] plm:base:set_hnp_name: initial bias 17020 nodename hash 
2714559920

[achel:17020] plm:base:set_hnp_name: final jobfam 1536
[achel:17020] [[1536,0],0] plm:rsh_setup on agent ssh : rsh path NULL
[achel:17020] [[1536,0],0] plm:base:receive start comm
[achel:17020] released to spawn
[achel:17020] [[1536,0],0] plm:base:setup_vm
[achel:17020] [[1536,0],0] plm:base:setup_vm creating map
[achel:17020] [[1536,0],0] plm:base:setup_vm add new daemon [[1536,0],1]
[achel:17020] [[1536,0],0] plm:base:setup_vm assigning new daemon 
[[1536,0],1] to node latrappe.c3local

[achel:17020] [[1536,0],0] plm:rsh: launching vm
[achel:17020] [[1536,0],0] plm:rsh: local shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: assuming same remote shell as local 
shell

[achel:17020] [[1536,0],0] plm:rsh: remote shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: final template argv:
	/usr/bin/ssh   orted -mca ess env -mca orte_ess_jobid 
100663296 -mca orte_ess_vpid  -mca orte_ess_num_procs 2 -mca 
orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca 
plm_base_verbose 5 -mca plm rsh

[achel:17020] [[1536,0],0] plm:rsh: launching on node latrappe.c3local
[achel:17020] [[1536,0],0] plm:rsh: recording launch of daemon 
[[1536,0],1]

[achel:17020] [[1536,0],0] plm:base:daemon_callback
[achel:17020] [[1536,0],0] plm:rsh: executing: (//usr/bin/ssh) 
[/usr/bin/ssh latrappe.c3local  orted -mca ess env -mca orte_ess_jobid 
100663296 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca 
orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca 
plm_base_verbose 5 -mca plm rsh]

[latrappe:18212] mca:base:select:(  plm) Querying component [rsh]
[latrappe:18212] mca:base:select:(  plm) Query of component [rsh] set 
priority to 10

[latrappe:18212] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] [[1536,0],0] plm:base:orted_report_launch from daemon 
[[1536,0],1] via [[1536,0],1]
[achel:17020] [[1536,0],0] ORTE_ERROR_LOG: Out of resource in file 
base/plm_base_launch_support.c at line 482
[achel:17020] [[1536,0],0] plm:base:orted_report_launch failed for 
daemon [[1536,0],1] (via [[1536,0],1]) at contact 
100663296.1;tcp://10.254.222.7:33825
[achel:17020] [[1536,0],0] plm:base:orted_cmd sending orted_exit 
commands
[achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit abnormal term 
ordered
[achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit sending cmd to 
[[1536,0],1]
[achel:17020] [[1536,0],0] plm:base:orted_cmd message to [[1536,0],1] 
sent

[achel:17020] [[1536,0],0] plm:base:orted_cmd all messages sent
[achel:17020] [[1536,0],0] plm:tm: daemon launch failed on error (null)
[latrappe:18212] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:18212] [[1536,0],1] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../orte/orted/orted_comm.c at line 235

[achel:17020] [[1536,0],0] plm:base:receive stop comm
[latrappe:18212] [[1536,0],1] routed:binomial: Connection to lifeline 
[[1536,0],0] lost


Thanks in advance,

Edson



Re: [OMPI users] Connection to lifeline lost

2014-01-24 Thread etcamargo
You are right. The problem was solved put the entire path of one mpi 
version:


/home/myuser/openmpi-x/bin/mpirun -hostfile machines -np 2 ./hello

Thanks,

Edson



Em 24-01-2014 16:00, Ralph Castain escreveu:

Looks to me like you are picking up a different OMPI installation on
the remote node - check that your path and ld_library_path on the
remote host are being set correctly
On Jan 24, 2014, at 9:41 AM, etcamargo  wrote:


Hi, All!

Please, I have a problem to run a simple "hello world" program on 
different hosts. The hosts are virtual machines located in the same 
net. The program works fine only on one host, the ssh is ok between 
the machines and nfs is ok, sharing the  executable files between the 
machines.


a) $ mpirun -hostfile machines -v -np 2 ./hello

[achel:15275] [[32727,0],0] ORTE_ERROR_LOG: Out of resource in file 
base/plm_base_launch_support.c at line 482

[latrappe:16467] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:16467] [[32727,0],1] ORTE_ERROR_LOG: Pack data mismatch in 
file ../../../orte/orted/orted_comm.c at line 235
[latrappe:16467] [[32727,0],1] routed:binomial: Connection to lifeline 
[[32727,0],0] lost



b) $ mpirun -mca plm_base_verbose 5 -hostfile machines -v -np 2 
./hello


[achel:17020] mca:base:select:(  plm) Querying component [rsh]
[achel:17020] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh 
path NULL
[achel:17020] mca:base:select:(  plm) Query of component [rsh] set 
priority to 10

[achel:17020] mca:base:select:(  plm) Querying component [slurm]
[achel:17020] mca:base:select:(  plm) Skipping component [slurm]. 
Query failed to return a module

[achel:17020] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] plm:base:set_hnp_name: initial bias 17020 nodename hash 
2714559920

[achel:17020] plm:base:set_hnp_name: final jobfam 1536
[achel:17020] [[1536,0],0] plm:rsh_setup on agent ssh : rsh path NULL
[achel:17020] [[1536,0],0] plm:base:receive start comm
[achel:17020] released to spawn
[achel:17020] [[1536,0],0] plm:base:setup_vm
[achel:17020] [[1536,0],0] plm:base:setup_vm creating map
[achel:17020] [[1536,0],0] plm:base:setup_vm add new daemon 
[[1536,0],1]
[achel:17020] [[1536,0],0] plm:base:setup_vm assigning new daemon 
[[1536,0],1] to node latrappe.c3local

[achel:17020] [[1536,0],0] plm:rsh: launching vm
[achel:17020] [[1536,0],0] plm:rsh: local shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: assuming same remote shell as 
local shell

[achel:17020] [[1536,0],0] plm:rsh: remote shell: 0 (bash)
[achel:17020] [[1536,0],0] plm:rsh: final template argv:
	/usr/bin/ssh   orted -mca ess env -mca orte_ess_jobid 
100663296 -mca orte_ess_vpid  -mca orte_ess_num_procs 2 -mca 
orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca 
plm_base_verbose 5 -mca plm rsh

[achel:17020] [[1536,0],0] plm:rsh: launching on node latrappe.c3local
[achel:17020] [[1536,0],0] plm:rsh: recording launch of daemon 
[[1536,0],1]

[achel:17020] [[1536,0],0] plm:base:daemon_callback
[achel:17020] [[1536,0],0] plm:rsh: executing: (//usr/bin/ssh) 
[/usr/bin/ssh latrappe.c3local  orted -mca ess env -mca orte_ess_jobid 
100663296 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca 
orte_hnp_uri "100663296.0;tcp://10.254.222.5:37564" -mca 
plm_base_verbose 5 -mca plm rsh]

[latrappe:18212] mca:base:select:(  plm) Querying component [rsh]
[latrappe:18212] mca:base:select:(  plm) Query of component [rsh] set 
priority to 10

[latrappe:18212] mca:base:select:(  plm) Selected component [rsh]
[achel:17020] [[1536,0],0] plm:base:orted_report_launch from daemon 
[[1536,0],1] via [[1536,0],1]
[achel:17020] [[1536,0],0] ORTE_ERROR_LOG: Out of resource in file 
base/plm_base_launch_support.c at line 482
[achel:17020] [[1536,0],0] plm:base:orted_report_launch failed for 
daemon [[1536,0],1] (via [[1536,0],1]) at contact 
100663296.1;tcp://10.254.222.7:33825
[achel:17020] [[1536,0],0] plm:base:orted_cmd sending orted_exit 
commands
[achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit abnormal term 
ordered
[achel:17020] [[1536,0],0] plm:base:orted_cmd:orted_exit sending cmd 
to [[1536,0],1]
[achel:17020] [[1536,0],0] plm:base:orted_cmd message to [[1536,0],1] 
sent

[achel:17020] [[1536,0],0] plm:base:orted_cmd all messages sent
[achel:17020] [[1536,0],0] plm:tm: daemon launch failed on error 
(null)

[latrappe:18212] OPAL dss:unpack: got type 49 when expecting type 38
[latrappe:18212] [[1536,0],1] ORTE_ERROR_LOG: Pack data mismatch in 
file ../../../orte/orted/orted_comm.c at line 235

[achel:17020] [[1536,0],0] plm:base:receive stop comm
[latrappe:18212] [[1536,0],1] routed:binomial: Connection to lifeline 
[[1536,0],0] lost


Thanks in advance,

Edson

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] NAS Parallel Benchmark implementation for (open) MPI/C

2015-05-25 Thread etcamargo

Hi, All

I am looking for a NAS Parallel Bechmark (NAS-PB) reference 
implementation coded in MPI/C language. I see that the NAS official 
website has a MPI/fortran implementation.


There is a NAS-PB reference implementation in (open)MPI/C?

Thanks in advance,

Edson