Hello All,

I am trying to run an OpenMPI application across two physical machines. 

I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
looking through the logs (attached), I cannot seem to find out the cause, and 
how to fix it.

I see lot of (communication) components being loaded and then unloaded, and I 
do not see which nodes pick up what kind of comm-interface.

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.

The "mpirun" line is:

mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map 
-report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v 
--timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh  

where the .sh files are fixes for forwarding X-windows from multiple machines 
to the machines where I am logged in.

Can anyone help?

Thanks a lot.

Best,

Devendra
--- Begin Message ---
Hello All,

I am trying to run an OpenMPI application across two physical machines. 

I get an error "Returned "Unreachable" (-12) instead of "Success" (0)", and 
looking through the logs (attached), I cannot seem to find out the cause, and 
how to fix it.

I see lot of (communication) components being loaded and then unloaded, and I 
do not see which nodes pick up what kind of comm-interface.

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.

The "mpirun" line is:

mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -report-pid -display-map 
-report-bindings -hostfile hostfile -np 7 -v --rankfile rankfile.txt -v 
--timestamp-output --tag-output ./xstartwrapper.sh ./run_gdb.sh  

where the .sh files are fixes for forwarding X-windows from multiple machines 
to the machines where I am logged in.

Can anyone help?

Thanks a lot.

Best,

Devendra
reset: standard error: Invalid argument

Destination is: r...@tik33x.ethz.ch
Host is: tik33x
Destination is: r...@tik34x.ethz.ch
Host is: tik34x
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host tik33x.

This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for the "lamhalt" command.

Please run the "lamboot" command the start the LAM/MPI runtime
environment.  See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<11170> ssi:boot:open: opening
n-1<11170> ssi:boot:open: opening boot module globus
n-1<11170> ssi:boot:open: opened boot module globus
n-1<11170> ssi:boot:open: opening boot module rsh
n-1<11170> ssi:boot:open: opened boot module rsh
n-1<11170> ssi:boot:open: opening boot module slurm
n-1<11170> ssi:boot:open: opened boot module slurm
n-1<11170> ssi:boot:select: initializing boot module slurm
n-1<11170> ssi:boot:slurm: not running under SLURM
n-1<11170> ssi:boot:select: boot module not available: slurm
n-1<11170> ssi:boot:select: initializing boot module rsh
n-1<11170> ssi:boot:rsh: module initializing
n-1<11170> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<11170> ssi:boot:rsh:username: <same>
n-1<11170> ssi:boot:rsh:verbose: 1000
n-1<11170> ssi:boot:rsh:algorithm: linear
n-1<11170> ssi:boot:rsh:no_n: 0
n-1<11170> ssi:boot:rsh:no_profile: 0
n-1<11170> ssi:boot:rsh:fast: 0
n-1<11170> ssi:boot:rsh:ignore_stderr: 0
n-1<11170> ssi:boot:rsh:priority: 10
n-1<11170> ssi:boot:select: boot module available: rsh, priority: 10
n-1<11170> ssi:boot:select: initializing boot module globus
n-1<11170> ssi:boot:globus: globus-job-run not found, globus boot will not run
n-1<11170> ssi:boot:select: boot module not available: globus
n-1<11170> ssi:boot:select: finalizing boot module slurm
n-1<11170> ssi:boot:slurm: finalizing
n-1<11170> ssi:boot:select: closing boot module slurm
n-1<11170> ssi:boot:select: finalizing boot module globus
n-1<11170> ssi:boot:globus: finalizing
n-1<11170> ssi:boot:select: closing boot module globus
n-1<11170> ssi:boot:select: selected boot module rsh
n-1<11170> ssi:boot:base: looking for boot schema in following directories:
n-1<11170> ssi:boot:base:   <current directory>
n-1<11170> ssi:boot:base:   $TROLLIUSHOME/etc
n-1<11170> ssi:boot:base:   $LAMHOME/etc
n-1<11170> ssi:boot:base:   /usr/lib/lam/etc
n-1<11170> ssi:boot:base: looking for boot schema file:
n-1<11170> ssi:boot:base:   TIK_lamboot_hostfile.txt
n-1<11170> ssi:boot:base: found boot schema: TIK_lamboot_hostfile.txt
n-1<11170> ssi:boot:rsh: found the following hosts:
n-1<11170> ssi:boot:rsh:   n0 tik33x.ethz.ch (cpu=1) 
n-1<11170> ssi:boot:rsh:   n1 tik34x.ethz.ch (cpu=1) 
n-1<11170> ssi:boot:rsh: resolved hosts:
n-1<11170> ssi:boot:rsh:   n0 tik33x.ethz.ch --> 129.132.67.174 (origin)
n-1<11170> ssi:boot:rsh:   n1 tik34x.ethz.ch --> 129.132.67.175
n-1<11170> ssi:boot:rsh: starting RTE procs
n-1<11170> ssi:boot:base:linear: starting
n-1<11170> ssi:boot:base:server: opening server TCP socket
n-1<11170> ssi:boot:base:server: opened port 41544
n-1<11170> ssi:boot:base:linear: booting n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting lamd on (tik33x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting on n0 (tik33x.ethz.ch): hboot -t -c 
lam-conf.lamd -d -v -I -H tik33x.ethz.ch -P 41544 -n 0 -o 0
n-1<11170> ssi:boot:rsh: launching locally
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back: /tmp/lam-raid@tik33x/lam-killfile
tkill: f_kill = "/tmp/lam-raid@tik33x/lam-killfile"
tkill: nothing to kill: "/tmp/lam-raid@tik33x/lam-killfile"
hboot: performing tkill
hboot: tkill -d 
hboot: booting...
hboot: fork /usr/bin/X11/lamd
[1]  11200 lamd -H tik33x.ethz.ch -P 41544 -n 0 -o 0 -d
n-1<11170> ssi:boot:rsh: successfully launched on n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: expecting connection from finite list
n-1<11200> ssi:boot:open: opening
n-1<11200> ssi:boot:open: opening boot module globus
n-1<11200> ssi:boot:open: opened boot module globus
n-1<11200> ssi:boot:open: opening boot module rsh
n-1<11200> ssi:boot:open: opened boot module rsh
n-1<11200> ssi:boot:open: opening boot module slurm
n-1<11200> ssi:boot:open: opened boot module slurm
n-1<11200> ssi:boot:select: initializing boot module slurm
n-1<11200> ssi:boot:slurm: not running under SLURM
n-1<11200> ssi:boot:select: boot module not available: slurm
n-1<11200> ssi:boot:select: initializing boot module rsh
n-1<11200> ssi:boot:rsh: module initializing
n-1<11200> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<11200> ssi:boot:rsh:username: <same>
n-1<11200> ssi:boot:rsh:verbose: 1000
n-1<11200> ssi:boot:rsh:algorithm: linear
n-1<11200> ssi:boot:rsh:no_n: 0
n-1<11200> ssi:boot:rsh:no_profile: 0
n-1<11200> ssi:boot:rsh:fast: 0
n-1<11200> ssi:boot:rsh:ignore_stderr: 0
n-1<11200> ssi:boot:rsh:priority: 10
n-1<11200> ssi:boot:select: boot module available: rsh, priority: 10
n-1<11200> ssi:boot:select: initializing boot module globus
n-1<11200> ssi:boot:globus: globus-job-run not found, globus boot will not run
n-1<11200> ssi:boot:select: boot module not available: globus
n-1<11200> ssi:boot:select: finalizing boot module slurm
n-1<11200> ssi:boot:slurm: finalizing
n-1<11200> ssi:boot:select: closing boot module slurm
n-1<11200> ssi:boot:select: finalizing boot module globus
n-1<11200> ssi:boot:globus: finalizing
n-1<11200> ssi:boot:select: closing boot module globus
n-1<11200> ssi:boot:select: selected boot module rsh
n-1<11200> ssi:boot:send_lamd: getting node ID from command line
n-1<11200> ssi:boot:send_lamd: getting agent haddr from command line
n-1<11200> ssi:boot:send_lamd: getting agent port from command line
n-1<11200> ssi:boot:send_lamd: getting node ID from command line
n-1<11200> ssi:boot:send_lamd: connecting to 129.132.67.174:41544, node id 0
n-1<11170> ssi:boot:base:server: got connection from 129.132.67.174
n-1<11170> ssi:boot:base:server: this connection is expected (n0)
n-1<11200> ssi:boot:send_lamd: sending dli_port 41973
n-1<11170> ssi:boot:base:server: remote lamd is at 129.132.67.174:41973
n-1<11170> ssi:boot:base:linear: booting n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting lamd on (tik34x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting on n1 (tik34x.ethz.ch): hboot -t -c 
lam-conf.lamd -d -v -s -I "-H tik33x.ethz.ch -P 41544 -n 1 -o 0"
n-1<11170> ssi:boot:rsh: launching remotely
n-1<11170> ssi:boot:rsh: attempting to execute: /usr/bin/rsh tik34x.ethz.ch -n 
-l raid 'echo $SHELL'
n-1<11170> ssi:boot:rsh: remote shell /bin/tcsh
n-1<11170> ssi:boot:rsh: attempting to execute: /usr/bin/rsh tik34x.ethz.ch -n 
-l raid hboot -t -c lam-conf.lamd -d -v -s -I '"-H tik33x.ethz.ch -P 41544 -n 1 
-o 0"'
tkill: setting prefix to (null)
tkill: setting suffix to (null)

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

tkill: got killname back: /tmp/lam-raid@tik34x/lam-killfile
tkill: f_kill = "/tmp/lam-raid@tik34x/lam-killfile"
tkill: nothing to kill: "/tmp/lam-raid@tik34x/lam-killfile"
hboot: performing tkill
hboot: tkill -d 
hboot: booting...
hboot: fork /usr/bin/lamd
[1]   3246 lamd -H tik33x.ethz.ch -P 41544 -n 1 -o 0 -d
n-1<11170> ssi:boot:rsh: successfully launched on n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: expecting connection from finite list
n-1<11170> ssi:boot:base:server: got connection from 129.132.67.175
n-1<11170> ssi:boot:base:server: this connection is expected (n1)
n-1<11170> ssi:boot:base:server: remote lamd is at 129.132.67.175:37523
n-1<11170> ssi:boot:base:server: closing server socket
n-1<11170> ssi:boot:base:server: connecting to lamd at 129.132.67.174:51093
n-1<11170> ssi:boot:base:server: connected
n-1<11170> ssi:boot:base:server: sending number of links (2)
n-1<11170> ssi:boot:base:server: sending info: n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: sending info: n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: finished sending
n-1<11170> ssi:boot:base:server: disconnected from 129.132.67.174:51093
n-1<11170> ssi:boot:base:server: connecting to lamd at 129.132.67.175:53828
n-1<11170> ssi:boot:base:server: connected
n-1<11170> ssi:boot:base:server: sending number of links (2)
n-1<11170> ssi:boot:base:server: sending info: n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: sending info: n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: finished sending
n-1<11170> ssi:boot:base:server: disconnected from 129.132.67.175:53828
n-1<11170> ssi:boot:base:linear: finished
n-1<11170> ssi:boot:rsh: all RTE procs started
n-1<11170> ssi:boot:rsh: finalizing
n-1<11170> ssi:boot: Closing
n-1<11200> ssi:boot:rsh: finalizing
n-1<11200> ssi:boot: Closing
rm: cannot remove `setuplog.*': No such file or directory

 ========================   JOB MAP   ========================

 Data for node: Name: tik33x    Num procs: 4
        Process OMPI jobid: [10782,1] Process rank: 0
        Process OMPI jobid: [10782,1] Process rank: 1
        Process OMPI jobid: [10782,1] Process rank: 2
        Process OMPI jobid: [10782,1] Process rank: 3

 Data for node: Name: tik34x.ethz.ch    Num procs: 3
        Process OMPI jobid: [10782,1] Process rank: 4
        Process OMPI jobid: [10782,1] Process rank: 5
        Process OMPI jobid: [10782,1] Process rank: 6

 =============================================================
mpirun pid: 11303
Wed May 16 12:09:45 2012[1,0]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],0] to slot_list 0:0
Wed May 16 12:09:45 2012[1,1]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],1] to slot_list 0:1
Wed May 16 12:09:45 2012[1,2]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],2] to slot_list 1:0
Wed May 16 12:09:45 2012[1,3]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],3] to slot_list 1:1
Wed May 16 12:09:45 2012[1,0]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,1]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,2]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,4]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,5]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,6]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,3]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
tcp returned success
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
Wed May 16 12:09:45 2012[1,6]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,6]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,6]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,6]<stderr>:[tik34x:3331] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,1]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,1]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,1]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,1]<stderr>:[tik33x:11430] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,2]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,2]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,2]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,2]<stderr>:[tik33x:11449] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,3]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,3]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,3]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,3]<stderr>:[tik33x:11452] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,5]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,5]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,5]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,5]<stderr>:[tik34x:3333] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 6 with PID 3270 on
node tik34x.ethz.ch exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[tik33x:11303] 4 more processes have sent help message help-mca-bml-r2.txt / 
unreachable proc
[tik33x:11303] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages
[tik33x:11303] 4 more processes have sent help message help-mpi-runtime / 
mpi_init:startup:internal-failure
-----------------FINISHED------------------

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University



--- End Message ---
reset: standard error: Invalid argument

Destination is: r...@tik33x.ethz.ch
Host is: tik33x
Destination is: r...@tik34x.ethz.ch
Host is: tik34x
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host tik33x.

This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for the "lamhalt" command.

Please run the "lamboot" command the start the LAM/MPI runtime
environment.  See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<11170> ssi:boot:open: opening
n-1<11170> ssi:boot:open: opening boot module globus
n-1<11170> ssi:boot:open: opened boot module globus
n-1<11170> ssi:boot:open: opening boot module rsh
n-1<11170> ssi:boot:open: opened boot module rsh
n-1<11170> ssi:boot:open: opening boot module slurm
n-1<11170> ssi:boot:open: opened boot module slurm
n-1<11170> ssi:boot:select: initializing boot module slurm
n-1<11170> ssi:boot:slurm: not running under SLURM
n-1<11170> ssi:boot:select: boot module not available: slurm
n-1<11170> ssi:boot:select: initializing boot module rsh
n-1<11170> ssi:boot:rsh: module initializing
n-1<11170> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<11170> ssi:boot:rsh:username: <same>
n-1<11170> ssi:boot:rsh:verbose: 1000
n-1<11170> ssi:boot:rsh:algorithm: linear
n-1<11170> ssi:boot:rsh:no_n: 0
n-1<11170> ssi:boot:rsh:no_profile: 0
n-1<11170> ssi:boot:rsh:fast: 0
n-1<11170> ssi:boot:rsh:ignore_stderr: 0
n-1<11170> ssi:boot:rsh:priority: 10
n-1<11170> ssi:boot:select: boot module available: rsh, priority: 10
n-1<11170> ssi:boot:select: initializing boot module globus
n-1<11170> ssi:boot:globus: globus-job-run not found, globus boot will not run
n-1<11170> ssi:boot:select: boot module not available: globus
n-1<11170> ssi:boot:select: finalizing boot module slurm
n-1<11170> ssi:boot:slurm: finalizing
n-1<11170> ssi:boot:select: closing boot module slurm
n-1<11170> ssi:boot:select: finalizing boot module globus
n-1<11170> ssi:boot:globus: finalizing
n-1<11170> ssi:boot:select: closing boot module globus
n-1<11170> ssi:boot:select: selected boot module rsh
n-1<11170> ssi:boot:base: looking for boot schema in following directories:
n-1<11170> ssi:boot:base:   <current directory>
n-1<11170> ssi:boot:base:   $TROLLIUSHOME/etc
n-1<11170> ssi:boot:base:   $LAMHOME/etc
n-1<11170> ssi:boot:base:   /usr/lib/lam/etc
n-1<11170> ssi:boot:base: looking for boot schema file:
n-1<11170> ssi:boot:base:   TIK_lamboot_hostfile.txt
n-1<11170> ssi:boot:base: found boot schema: TIK_lamboot_hostfile.txt
n-1<11170> ssi:boot:rsh: found the following hosts:
n-1<11170> ssi:boot:rsh:   n0 tik33x.ethz.ch (cpu=1) 
n-1<11170> ssi:boot:rsh:   n1 tik34x.ethz.ch (cpu=1) 
n-1<11170> ssi:boot:rsh: resolved hosts:
n-1<11170> ssi:boot:rsh:   n0 tik33x.ethz.ch --> 129.132.67.174 (origin)
n-1<11170> ssi:boot:rsh:   n1 tik34x.ethz.ch --> 129.132.67.175
n-1<11170> ssi:boot:rsh: starting RTE procs
n-1<11170> ssi:boot:base:linear: starting
n-1<11170> ssi:boot:base:server: opening server TCP socket
n-1<11170> ssi:boot:base:server: opened port 41544
n-1<11170> ssi:boot:base:linear: booting n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting lamd on (tik33x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting on n0 (tik33x.ethz.ch): hboot -t -c 
lam-conf.lamd -d -v -I -H tik33x.ethz.ch -P 41544 -n 0 -o 0
n-1<11170> ssi:boot:rsh: launching locally
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back: /tmp/lam-raid@tik33x/lam-killfile
tkill: f_kill = "/tmp/lam-raid@tik33x/lam-killfile"
tkill: nothing to kill: "/tmp/lam-raid@tik33x/lam-killfile"
hboot: performing tkill
hboot: tkill -d 
hboot: booting...
hboot: fork /usr/bin/X11/lamd
[1]  11200 lamd -H tik33x.ethz.ch -P 41544 -n 0 -o 0 -d
n-1<11170> ssi:boot:rsh: successfully launched on n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: expecting connection from finite list
n-1<11200> ssi:boot:open: opening
n-1<11200> ssi:boot:open: opening boot module globus
n-1<11200> ssi:boot:open: opened boot module globus
n-1<11200> ssi:boot:open: opening boot module rsh
n-1<11200> ssi:boot:open: opened boot module rsh
n-1<11200> ssi:boot:open: opening boot module slurm
n-1<11200> ssi:boot:open: opened boot module slurm
n-1<11200> ssi:boot:select: initializing boot module slurm
n-1<11200> ssi:boot:slurm: not running under SLURM
n-1<11200> ssi:boot:select: boot module not available: slurm
n-1<11200> ssi:boot:select: initializing boot module rsh
n-1<11200> ssi:boot:rsh: module initializing
n-1<11200> ssi:boot:rsh:agent: /usr/bin/rsh
n-1<11200> ssi:boot:rsh:username: <same>
n-1<11200> ssi:boot:rsh:verbose: 1000
n-1<11200> ssi:boot:rsh:algorithm: linear
n-1<11200> ssi:boot:rsh:no_n: 0
n-1<11200> ssi:boot:rsh:no_profile: 0
n-1<11200> ssi:boot:rsh:fast: 0
n-1<11200> ssi:boot:rsh:ignore_stderr: 0
n-1<11200> ssi:boot:rsh:priority: 10
n-1<11200> ssi:boot:select: boot module available: rsh, priority: 10
n-1<11200> ssi:boot:select: initializing boot module globus
n-1<11200> ssi:boot:globus: globus-job-run not found, globus boot will not run
n-1<11200> ssi:boot:select: boot module not available: globus
n-1<11200> ssi:boot:select: finalizing boot module slurm
n-1<11200> ssi:boot:slurm: finalizing
n-1<11200> ssi:boot:select: closing boot module slurm
n-1<11200> ssi:boot:select: finalizing boot module globus
n-1<11200> ssi:boot:globus: finalizing
n-1<11200> ssi:boot:select: closing boot module globus
n-1<11200> ssi:boot:select: selected boot module rsh
n-1<11200> ssi:boot:send_lamd: getting node ID from command line
n-1<11200> ssi:boot:send_lamd: getting agent haddr from command line
n-1<11200> ssi:boot:send_lamd: getting agent port from command line
n-1<11200> ssi:boot:send_lamd: getting node ID from command line
n-1<11200> ssi:boot:send_lamd: connecting to 129.132.67.174:41544, node id 0
n-1<11170> ssi:boot:base:server: got connection from 129.132.67.174
n-1<11170> ssi:boot:base:server: this connection is expected (n0)
n-1<11200> ssi:boot:send_lamd: sending dli_port 41973
n-1<11170> ssi:boot:base:server: remote lamd is at 129.132.67.174:41973
n-1<11170> ssi:boot:base:linear: booting n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting lamd on (tik34x.ethz.ch)
n-1<11170> ssi:boot:rsh: starting on n1 (tik34x.ethz.ch): hboot -t -c 
lam-conf.lamd -d -v -s -I "-H tik33x.ethz.ch -P 41544 -n 1 -o 0"
n-1<11170> ssi:boot:rsh: launching remotely
n-1<11170> ssi:boot:rsh: attempting to execute: /usr/bin/rsh tik34x.ethz.ch -n 
-l raid 'echo $SHELL'
n-1<11170> ssi:boot:rsh: remote shell /bin/tcsh
n-1<11170> ssi:boot:rsh: attempting to execute: /usr/bin/rsh tik34x.ethz.ch -n 
-l raid hboot -t -c lam-conf.lamd -d -v -s -I '"-H tik33x.ethz.ch -P 41544 -n 1 
-o 0"'
tkill: setting prefix to (null)
tkill: setting suffix to (null)

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

tkill: got killname back: /tmp/lam-raid@tik34x/lam-killfile
tkill: f_kill = "/tmp/lam-raid@tik34x/lam-killfile"
tkill: nothing to kill: "/tmp/lam-raid@tik34x/lam-killfile"
hboot: performing tkill
hboot: tkill -d 
hboot: booting...
hboot: fork /usr/bin/lamd
[1]   3246 lamd -H tik33x.ethz.ch -P 41544 -n 1 -o 0 -d
n-1<11170> ssi:boot:rsh: successfully launched on n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: expecting connection from finite list
n-1<11170> ssi:boot:base:server: got connection from 129.132.67.175
n-1<11170> ssi:boot:base:server: this connection is expected (n1)
n-1<11170> ssi:boot:base:server: remote lamd is at 129.132.67.175:37523
n-1<11170> ssi:boot:base:server: closing server socket
n-1<11170> ssi:boot:base:server: connecting to lamd at 129.132.67.174:51093
n-1<11170> ssi:boot:base:server: connected
n-1<11170> ssi:boot:base:server: sending number of links (2)
n-1<11170> ssi:boot:base:server: sending info: n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: sending info: n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: finished sending
n-1<11170> ssi:boot:base:server: disconnected from 129.132.67.174:51093
n-1<11170> ssi:boot:base:server: connecting to lamd at 129.132.67.175:53828
n-1<11170> ssi:boot:base:server: connected
n-1<11170> ssi:boot:base:server: sending number of links (2)
n-1<11170> ssi:boot:base:server: sending info: n0 (tik33x.ethz.ch)
n-1<11170> ssi:boot:base:server: sending info: n1 (tik34x.ethz.ch)
n-1<11170> ssi:boot:base:server: finished sending
n-1<11170> ssi:boot:base:server: disconnected from 129.132.67.175:53828
n-1<11170> ssi:boot:base:linear: finished
n-1<11170> ssi:boot:rsh: all RTE procs started
n-1<11170> ssi:boot:rsh: finalizing
n-1<11170> ssi:boot: Closing
n-1<11200> ssi:boot:rsh: finalizing
n-1<11200> ssi:boot: Closing
rm: cannot remove `setuplog.*': No such file or directory

 ========================   JOB MAP   ========================

 Data for node: Name: tik33x    Num procs: 4
        Process OMPI jobid: [10782,1] Process rank: 0
        Process OMPI jobid: [10782,1] Process rank: 1
        Process OMPI jobid: [10782,1] Process rank: 2
        Process OMPI jobid: [10782,1] Process rank: 3

 Data for node: Name: tik34x.ethz.ch    Num procs: 3
        Process OMPI jobid: [10782,1] Process rank: 4
        Process OMPI jobid: [10782,1] Process rank: 5
        Process OMPI jobid: [10782,1] Process rank: 6

 =============================================================
mpirun pid: 11303
Wed May 16 12:09:45 2012[1,0]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],0] to slot_list 0:0
Wed May 16 12:09:45 2012[1,1]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],1] to slot_list 0:1
Wed May 16 12:09:45 2012[1,2]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],2] to slot_list 1:0
Wed May 16 12:09:45 2012[1,3]<stderr>:[tik33x:11303] [[10782,0],0] 
odls:default:fork binding child [[10782,1],3] to slot_list 1:1
Wed May 16 12:09:45 2012[1,0]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,1]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,2]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,4]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,5]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,6]<stdout>:Running DAL on tik34x
Wed May 16 12:09:45 2012[1,3]<stdout>:Running DAL on tik33x
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: Looking for btl components
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: opening btl components
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component self
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component self has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component self open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component sm
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component sm has no register function
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component sm open function successful
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: found loaded component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component tcp has no register function
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] mca: base: 
components_open: component tcp open function successful
Wed May 16 12:09:45 2012[1,1]<stddiag>:[tik33x:11430] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,2]<stddiag>:[tik33x:11449] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,0]<stddiag>:[tik33x:11429] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,3]<stddiag>:[tik33x:11452] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component self
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
self returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component sm
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
sm returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: initializing btl 
component tcp
Wed May 16 12:09:45 2012[1,4]<stddiag>:[tik34x:03332] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,6]<stddiag>:[tik34x:03331] select: init of component 
tcp returned success
Wed May 16 12:09:45 2012[1,5]<stddiag>:[tik34x:03333] select: init of component 
tcp returned success
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[10782,1],6]) is on host: tik34x
  Process 2 ([[10782,1],0]) is on host: tik33x
  BTLs attempted: self sm tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
Wed May 16 12:09:45 2012[1,6]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,6]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,6]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,6]<stderr>:[tik34x:3331] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,1]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,1]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,1]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,1]<stderr>:[tik33x:11430] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,2]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,2]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,2]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,2]<stderr>:[tik33x:11449] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,3]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,3]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,3]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,3]<stderr>:[tik33x:11452] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
Wed May 16 12:09:45 2012[1,5]<stderr>:*** The MPI_Init_thread() function was 
called before MPI_INIT was invoked.
Wed May 16 12:09:45 2012[1,5]<stderr>:*** This is disallowed by the MPI 
standard.
Wed May 16 12:09:45 2012[1,5]<stderr>:*** Your MPI job will now abort.
Wed May 16 12:09:45 2012[1,5]<stderr>:[tik34x:3333] Abort before MPI_INIT 
completed successfully; not able to guarantee that all other processes were 
killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 6 with PID 3270 on
node tik34x.ethz.ch exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[tik33x:11303] 4 more processes have sent help message help-mca-bml-r2.txt / 
unreachable proc
[tik33x:11303] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages
[tik33x:11303] 4 more processes have sent help message help-mpi-runtime / 
mpi_init:startup:internal-failure
-----------------FINISHED------------------

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University


Reply via email to