On 11-Aug-09, at 6:28 AM, Ralph Castain wrote:
-mca plm_base_verbose 5 --debug-daemons -mca odls_base_verbose 5
I'm afraid the output will be a tad verbose, but I would appreciate
seeing it. Might also tell us something about the lib issue.
Command line was:
/usr/local/openmpi/bin/mpirun -mca plm_base_verbose 5 --debug-daemons -
mca odls_base_verbose 5 -n 16 --host xserve03,xserve04 ../build/mitgcmuv
Starting: ../results//TasGaussRestart16
[saturna.cluster:07360] mca:base:select:( plm) Querying component [rsh]
[saturna.cluster:07360] mca:base:select:( plm) Query of component
[rsh] set priority to 10
[saturna.cluster:07360] mca:base:select:( plm) Querying component
[slurm]
[saturna.cluster:07360] mca:base:select:( plm) Skipping component
[slurm]. Query failed to return a module
[saturna.cluster:07360] mca:base:select:( plm) Querying component [tm]
[saturna.cluster:07360] mca:base:select:( plm) Skipping component
[tm]. Query failed to return a module
[saturna.cluster:07360] mca:base:select:( plm) Querying component
[xgrid]
[saturna.cluster:07360] mca:base:select:( plm) Skipping component
[xgrid]. Query failed to return a module
[saturna.cluster:07360] mca:base:select:( plm) Selected component [rsh]
[saturna.cluster:07360] plm:base:set_hnp_name: initial bias 7360
nodename hash 1656374957
[saturna.cluster:07360] plm:base:set_hnp_name: final jobfam 14551
[saturna.cluster:07360] [[14551,0],0] plm:base:receive start comm
[saturna.cluster:07360] mca: base: component_find: ras
"mca_ras_dash_host" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] mca: base: component_find: ras
"mca_ras_hostfile" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] mca: base: component_find: ras
"mca_ras_localhost" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] mca: base: component_find: ras "mca_ras_xgrid"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] mca:base:select:( odls) Querying component
[default]
[saturna.cluster:07360] mca:base:select:( odls) Query of component
[default] set priority to 1
[saturna.cluster:07360] mca:base:select:( odls) Selected component
[default]
[saturna.cluster:07360] mca: base: component_find: iof "mca_iof_proxy"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] mca: base: component_find: iof "mca_iof_svc"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] [[14551,0],0] plm:rsh: setting up job [14551,1]
[saturna.cluster:07360] [[14551,0],0] plm:base:setup_job for job
[14551,1]
[saturna.cluster:07360] [[14551,0],0] plm:rsh: local shell: 0 (bash)
[saturna.cluster:07360] [[14551,0],0] plm:rsh: assuming same remote
shell as local shell
[saturna.cluster:07360] [[14551,0],0] plm:rsh: remote shell: 0 (bash)
[saturna.cluster:07360] [[14551,0],0] plm:rsh: final template argv:
/usr/bin/ssh <template> PATH=/usr/local/openmpi/bin:$PATH ; export
PATH ; LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /usr/local/openmpi/bin/orted --debug-daemons
-mca ess env -mca orte_ess_jobid 953614336 -mca orte_ess_vpid
<template> -mca orte_ess_num_procs 3 --hnp-uri "953614336.0;tcp://
142.104.154.96:49622;tcp://192.168.2.254:49622" -mca plm_base_verbose
5 -mca odls_base_verbose 5
[saturna.cluster:07360] [[14551,0],0] plm:rsh: launching on node
xserve03
[saturna.cluster:07360] [[14551,0],0] plm:rsh: recording launch of
daemon [[14551,0],1]
[saturna.cluster:07360] [[14551,0],0] plm:rsh: executing: (//usr/bin/
ssh) [/usr/bin/ssh xserve03 PATH=/usr/local/openmpi/bin:$PATH ;
export PATH ; LD_LIBRARY_PATH=/usr/local/openmpi/lib:
$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /usr/local/openmpi/bin/
orted --debug-daemons -mca ess env -mca orte_ess_jobid 953614336 -mca
orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri "953614336.0;tcp://
142.104.154.96:49622;tcp://192.168.2.254:49622" -mca plm_base_verbose
5 -mca odls_base_verbose 5]
Daemon was launched on xserve03.local - beginning to initialize
[xserve03.local:40708] mca:base:select:( odls) Querying component
[default]
[xserve03.local:40708] mca:base:select:( odls) Query of component
[default] set priority to 1
[xserve03.local:40708] mca:base:select:( odls) Selected component
[default]
[xserve03.local:40708] mca: base: component_find: iof "mca_iof_proxy"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
[xserve03.local:40708] mca: base: component_find: iof "mca_iof_svc"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
Daemon [[14551,0],1] checking in as pid 40708 on host xserve03.local
Daemon [[14551,0],1] not using static ports
[saturna.cluster:07360] [[14551,0],0] plm:rsh: launching on node
xserve04
[saturna.cluster:07360] [[14551,0],0] plm:rsh: recording launch of
daemon [[14551,0],2]
[saturna.cluster:07360] [[14551,0],0] plm:rsh: executing: (//usr/bin/
ssh) [/usr/bin/ssh xserve04 PATH=/usr/local/openmpi/bin:$PATH ;
export PATH ; LD_LIBRARY_PATH=/usr/local/openmpi/lib:
$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /usr/local/openmpi/bin/
orted --debug-daemons -mca ess env -mca orte_ess_jobid 953614336 -mca
orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri "953614336.0;tcp://
142.104.154.96:49622;tcp://192.168.2.254:49622" -mca plm_base_verbose
5 -mca odls_base_verbose 5]
Daemon was launched on xserve04.local - beginning to initialize
[xserve04.local:40450] mca:base:select:( odls) Querying component
[default]
[xserve04.local:40450] mca:base:select:( odls) Query of component
[default] set priority to 1
[xserve04.local:40450] mca:base:select:( odls) Selected component
[default]
[xserve04.local:40450] mca: base: component_find: iof "mca_iof_proxy"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
[xserve04.local:40450] mca: base: component_find: iof "mca_iof_svc"
uses an MCA interface that is not recognized (component MCA v1.0.0 !=
supported MCA v2.0.0) -- ignored
Daemon [[14551,0],2] checking in as pid 40450 on host xserve04.local
Daemon [[14551,0],2] not using static ports
[saturna.cluster:07360] [[14551,0],0] plm:base:daemon_callback
[saturna.cluster:07360] progressed_wait: base/
plm_base_launch_support.c 459
[xserve04.local:40450] [[14551,0],2] orted: up and running - waiting
for commands!
[saturna.cluster:07360] defining message event: base/
plm_base_launch_support.c 423
[saturna.cluster:07360] defining message event: base/
plm_base_launch_support.c 423
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_report_launch
from daemon [[14551,0],1]
[xserve03.local:40708] [[14551,0],1] orted: up and running - waiting
for commands!
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_report_launch
completed for daemon [[14551,0],1]
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_report_launch
from daemon [[14551,0],2]
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_report_launch
completed for daemon [[14551,0],2]
[saturna.cluster:07360] [[14551,0],0] plm:base:daemon_callback completed
[saturna.cluster:07360] [[14551,0],0] plm:base:launch_apps for job
[14551,1]
[saturna.cluster:07360] defining message event: grpcomm_bad_module.c 183
[saturna.cluster:07360] [[14551,0],0] plm:base:report_launched for job
[14551,1]
[saturna.cluster:07360] progressed_wait: base/
plm_base_launch_support.c 712
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor called
by [[14551,0],0] for tag 1
[saturna.cluster:07360] [[14551,0],0] node[0].name saturna daemon 0
arch ffc90200
[saturna.cluster:07360] [[14551,0],0] node[1].name xserve03 daemon 1
arch ffc90200
[saturna.cluster:07360] [[14551,0],0] node[2].name xserve04 daemon 2
arch ffc90200
[saturna.cluster:07360] [[14551,0],0] orted_cmd: received
add_local_procs
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list
[saturna.cluster:07360] [[14551,0],0] odls:construct_child_list
unpacking data to launch job [14551,1]
[saturna.cluster:07360] [[14551,0],0] odls:construct_child_list adding
new jobdat for job [14551,1]
[saturna.cluster:07360] [[14551,0],0] odls:construct_child_list
unpacking 1 app_contexts
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 0 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 1 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 2 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 3 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 4 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 5 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 6 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 7 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 8 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 9 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 10 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 11 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 12 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 13 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 14 on node 1 with daemon 1
[saturna.cluster:07360] [[14551,0],0] odls:constructing child list -
checking proc 15 on node 2 with daemon 2
[saturna.cluster:07360] [[14551,0],0] odls:construct:child:
num_participating 2
[saturna.cluster:07360] [[14551,0],0] odls:launch found 4 processors
for 0 children and set oversubscribed to false
[saturna.cluster:07360] [[14551,0],0] odls:launch reporting job
[14551,1] launch status
[saturna.cluster:07360] defining message event: base/
odls_base_default_fns.c 1219
[saturna.cluster:07360] [[14551,0],0] odls:launch setting waitpids
[saturna.cluster:07360] [[14551,0],0] orte:daemon:send_relay
[saturna.cluster:07360] [[14551,0],0] orte:daemon:send_relay sending
relay msg to 1
[saturna.cluster:07360] [[14551,0],0] orte:daemon:send_relay sending
relay msg to 2
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch from
daemon [[14551,0],0]
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch
completed processing
[xserve04.local:40450] [[14551,0],2] node[0].name saturna daemon 0
arch ffc90200
[xserve04.local:40450] [[14551,0],2] node[1].name xserve03 daemon 1
arch ffc90200
[xserve04.local:40450] [[14551,0],2] node[2].name xserve04 daemon 2
arch ffc90200
[xserve04.local:40450] [[14551,0],2] orted_cmd: received add_local_procs
[xserve03.local:40708] [[14551,0],1] node[0].name saturna daemon 0
arch ffc90200
[xserve03.local:40708] [[14551,0],1] node[1].name xserve03 daemon 1
arch ffc90200
[xserve03.local:40708] [[14551,0],1] node[2].name xserve04 daemon 2
arch ffc90200
[xserve03.local:40708] [[14551,0],1] orted_cmd: received add_local_procs
[saturna.cluster:07360] defining message event: base/
plm_base_launch_support.c 668
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch
reissuing non-blocking recv
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch from
daemon [[14551,0],1]
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],0] from daemon [[14551,0],1]: pid 40710 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],2] from daemon [[14551,0],1]: pid 40711 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],4] from daemon [[14551,0],1]: pid 40712 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],6] from daemon [[14551,0],1]: pid 40713 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],8] from daemon [[14551,0],1]: pid 40714 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],10] from daemon [[14551,0],1]: pid 40715 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],12] from daemon [[14551,0],1]: pid 40716 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],14] from daemon [[14551,0],1]: pid 40717 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch
completed processing
[saturna.cluster:07360] defining message event: base/
plm_base_launch_support.c 668
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch
reissuing non-blocking recv
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch from
daemon [[14551,0],2]
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],1] from daemon [[14551,0],2]: pid 40452 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],3] from daemon [[14551,0],2]: pid 40453 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],5] from daemon [[14551,0],2]: pid 40454 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],7] from daemon [[14551,0],2]: pid 40455 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],9] from daemon [[14551,0],2]: pid 40456 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],11] from daemon [[14551,0],2]: pid 40457 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],13] from daemon [[14551,0],2]: pid 40458 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launched for
proc [[14551,1],15] from daemon [[14551,0],2]: pid 40459 state 2 exit 0
[saturna.cluster:07360] [[14551,0],0] plm:base:app_report_launch
completed processing
[saturna.cluster:07360] [[14551,0],0] plm:base:report_launched all
apps reported
[saturna.cluster:07360] [[14551,0],0] plm:base:launch wiring up iof
[saturna.cluster:07360] [[14551,0],0] plm:base:launch completed for
job [14551,1]
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],0]
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],2]
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],4]
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],3]
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],1]
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40710] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40711] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40712] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40453] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],7]
[xserve04.local:40452] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],6]
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],5]
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],8]
[xserve04.local:40455] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40713] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40454] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40714] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],9]
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],10]
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],12]
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[saturna.cluster:07360] defining message event: base/
routed_base_receive.c 153
[xserve03.local:40708] [[14551,0],1] orted_recv: received sync+nidmap
from local proc [[14551,1],14]
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],11]
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],15]
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[xserve04.local:40450] [[14551,0],2] orted_recv: received sync+nidmap
from local proc [[14551,1],13]
[saturna.cluster:07360] defining message event: base/
routed_base_receive.c 153
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40715] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40716] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve03.local:40717] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40456] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve04.local:40457] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40459] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[saturna.cluster:07360] defining message event: iof_hnp_receive.c 227
[xserve04.local:40458] mca: base: component_find: rcache
"mca_rcache_rb" uses an MCA interface that is not recognized
(component MCA v1.0.0 != supported MCA v2.0.0) -- ignored
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[saturna.cluster:07360] [[14551,0],0] orted_recv_cmd: received message
from [[14551,0],1]
[saturna.cluster:07360] defining message event: orted/orted_comm.c 159
[xserve03.local:40708] [[14551,0],1] orted_cmd: received collective
data cmd
[saturna.cluster:07360] [[14551,0],0] orted_recv_cmd: reissued recv
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor called
by [[14551,0],1] for tag 1
[saturna.cluster:07360] [[14551,0],0] orted_cmd: received collective
data cmd
[saturna.cluster:07360] [[14551,0],0] odls: daemon collective called
[saturna.cluster:07360] [[14551,0],0] odls: daemon collective for job
[14551,1] from [[14551,0],1] type 2 num_collected 1 num_participating
2 num_contributors 8
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor:
processing commands completed
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[saturna.cluster:07360] [[14551,0],0] orted_recv_cmd: received message
from [[14551,0],2]
[xserve04.local:40450] [[14551,0],2] orted_cmd: received collective
data cmd
[saturna.cluster:07360] defining message event: orted/orted_comm.c 159
[saturna.cluster:07360] [[14551,0],0] orted_recv_cmd: reissued recv
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor called
by [[14551,0],2] for tag 1
[saturna.cluster:07360] [[14551,0],0] orted_cmd: received collective
data cmd
[saturna.cluster:07360] [[14551,0],0] odls: daemon collective called
[saturna.cluster:07360] [[14551,0],0] odls: daemon collective for job
[14551,1] from [[14551,0],2] type 2 num_collected 2 num_participating
2 num_contributors 16
[saturna.cluster:07360] [[14551,0],0] odls: daemon collective HNP -
xcasting to job [14551,1]
[saturna.cluster:07360] [[14551,0],0] ORTE_ERROR_LOG: Buffer type
(described vs non-described) mismatch - operation not allowed in file
base/odls_base_default_fns.c at line 2475
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor:
processing commands completed
^C[saturna.cluster:07360] defining timer event: 0 sec 0 usec at
orterun.c:1128
Killed by signal 2.
mpirun: killing job...
Killed by signal 2.
[saturna.cluster:07360] [[14551,0],0]:orterun.c(1031) updating exit
status to 1
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd sending
kill_local_procs cmds
[saturna.cluster:07360] [[14551,0],0]
plm:base:orted_cmd:kill_local_procs abnormal term ordered
[saturna.cluster:07360] defining message event: base/
plm_base_orted_cmds.c 276
[saturna.cluster:07360] [[14551,0],0]
plm:base:orted_cmd:kill_local_procs sending cmd to [[14551,0],1]
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd message to
[[14551,0],1] sent
[saturna.cluster:07360] [[14551,0],0]
plm:base:orted_cmd:kill_local_procs sending cmd to [[14551,0],2]
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd message to
[[14551,0],2] sent
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd all messages
sent
[saturna.cluster:07360] defining timeout: 0 sec 2000 usec at base/
plm_base_orted_cmds.c:321
[saturna.cluster:07360] progressed_wait: base/plm_base_orted_cmds.c 324
[saturna.cluster:07360] defining timeout: 0 sec 16000 usec at
orterun.c:1066
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor called
by [[14551,0],0] for tag 1
[saturna.cluster:07360] [[14551,0],0] odls:kill_local_proc working on
job [WILDCARD]
[saturna.cluster:07360] defining message event: base/
odls_base_default_fns.c 2267
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor:
processing commands completed
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed
called with NULL pointer
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed job
[14551,1] is not terminated
[saturna.cluster:07360] [[14551,0],0] daemon 2 failed with status 255
[saturna.cluster:07360] [[14551,0],0] plm:base:launch_failed abort in
progress, ignoring report
[saturna.cluster:07360] [[14551,0],0] daemon 1 failed with status 255
[saturna.cluster:07360] [[14551,0],0] plm:base:launch_failed abort in
progress, ignoring report
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got message
from [[14551,0],1]
[saturna.cluster:07360] defining message event: base/
plm_base_receive.c 327
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got message
from [[14551,0],2]
[saturna.cluster:07360] defining message event: base/
plm_base_receive.c 327
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for job [14551,1]
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 0 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 2 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 4 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 6 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 8 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 10 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 12 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 14 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed for
job [14551,1] - num_terminated 8 num_procs 16
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed
declared job [14551,1] aborted by proc [[14551,1],0] with code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed job
[14551,1] is not terminated
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for job [14551,1]
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 1 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 3 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 5 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 7 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 9 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 11 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 13 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:receive got
update_proc_state for vpid 15 state 400 exit_code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed for
job [14551,1] - num_terminated 16 num_procs 16
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed
declared job [14551,1] aborted by proc [[14551,1],0] with code 0
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed all
jobs terminated - waking up
[saturna.cluster:07360] [[14551,0],0] calling job_complete trigger
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 40710 on node xserve03
exited on signal 0 (Signal 0).
--------------------------------------------------------------------------
16 total processes killed (some possibly by mpirun during cleanup)
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd sending
orted_exit commands
[saturna.cluster:07360] [[14551,0],0] plm:base:orted_cmd:orted_exit
abnormal term ordered
[saturna.cluster:07360] defining message event: base/
plm_base_orted_cmds.c 142
[saturna.cluster:07360] defining timeout: 0 sec 0 usec at base/
plm_base_orted_cmds.c:186
[saturna.cluster:07360] progressed_wait: base/plm_base_orted_cmds.c 189
[saturna.cluster:07360] defining timeout: 0 sec 3000 usec at orterun.c:
752
[saturna.cluster:07360] [[14551,0],0] orte:daemon:cmd:processor called
by [[14551,0],0] for tag 1
[saturna.cluster:07360] [[14551,0],0] orted_cmd: received exit
[saturna.cluster:07360] [[14551,0],0] odls:kill_local_proc working on
job [WILDCARD]
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed for
job [14551,0] - num_terminated 3 num_procs 3
[saturna.cluster:07360] [[14551,0],0] plm:base:check_job_completed
declared job [14551,0] failed to start by proc [[14551,0],1]
[saturna.cluster:07360] [[14551,0],0] calling orted_exit trigger
mpirun: clean termination accomplished