Hi all,
I'm trying to run an openmpi application on a oar cluster. I think
the cluster is configured correctly but I still have problems when I
run mpirun:
apellegr@m45-037:~$ mpirun -prefix /n/poolfs/z/home/apellegr/openmpi
-machinefile $OAR_FILE_NODES -mca pls_rsh_agent "oarsh" -np 10 /n/
poolfs/z/home/apellegr/mpi_test/hello_world.x86 bash: -c: line 0:
syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 --nodename
m45-040.pool --universe apell...@m45-037.pool:default-universe-29482
--nsreplica "0.0.0;tcp://10.11.45.37:36790" --gprreplica
"0.0.0;tcp://10.11.45.37:36790"'
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 --nodename
m45-038.pool --universe apell...@m45-037.pool:default-universe-29482
--nsreplica "0.0.0;tcp://10.11.45.37:36790" --gprreplica
"0.0.0;tcp://10.11.45.37:36790"'
[m45-037.pool:29482] ERROR: A daemon on node m45-038.pool failed to
start as expected.
[m45-037.pool:29482] ERROR: There may be more information available
from
[m45-037.pool:29482] ERROR: the remote shell (see above).
[m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
status 2.
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 --nodename
m45-039.pool --universe apell...@m45-037.pool:default-universe-29482
--nsreplica "0.0.0;tcp://10.11.45.37:36790" --gprreplica
"0.0.0;tcp://10.11.45.37:36790"'
[m45-037.pool:29482] ERROR: A daemon on node m45-039.pool failed to
start as expected.
[m45-037.pool:29482] ERROR: There may be more information available
from
[m45-037.pool:29482] ERROR: the remote shell (see above).
[m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
status 2.
[m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 275
[m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1158
[m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp.c at line 90
[m45-037.pool:29482] ERROR: A daemon on node m45-040.pool failed to
start as expected.
[m45-037.pool:29482] ERROR: There may be more information available
from
[m45-037.pool:29482] ERROR: the remote shell (see above).
[m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
status 2.
[m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 188
[m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1190
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
apellegr@m45-037:~$
If I run it with the option "-mca pls_rsh_debug 1" I get:
apellegr@m45-037:~$ mpirun -prefix /n/poolfs/z/home/apellegr/openmpi
-machinefile $OAR_FILE_NODES -mca pls_rsh_debug 1 -mca pls_rsh_agent
"oarsh" -np 10 /n/poolfs/z/home/apellegr/mpi_test/hello_world.x86
[m45-037.pool:29473] pls:rsh: local shell: 2 (tcsh)
[m45-037.pool:29473] pls:rsh: assuming same remote shell as local
shell
[m45-037.pool:29473] pls:rsh: remote shell: 2 (tcsh)
[m45-037.pool:29473] pls:rsh: final template argv:
[m45-037.pool:29473] pls:rsh: /usr/bin/oarsh <template> orted --
bootproxy 1 --name <template> --num_procs 5 --vpid_start 0 --
nodename <template> --universe apell...@m45-037.pool:default-
universe-29473 --nsreplica "0.0.0;tcp://10.11.45.37:55477" --
gprreplica "0.0.0;tcp://10.11.45.37:55477"
[m45-037.pool:29473] pls:rsh: launching on node m45-037.pool
[m45-037.pool:29473] pls:rsh: m45-037.pool is a LOCAL node
[m45-037.pool:29473] pls:rsh: reset PATH: /n/poolfs/z/home/apellegr/
openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/
bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/bin:/n/poolfs/z/
home/apellegr/openssl/bin
[m45-037.pool:29473] pls:rsh: reset LD_LIBRARY_PATH: /n/poolfs/z/
home/apellegr/openmpi/lib
[m45-037.pool:29473] pls:rsh: changing to directory /home/apellegr
[m45-037.pool:29473] pls:rsh: executing: (/n/poolfs/z/home/apellegr/
openmpi/bin/orted) orted --bootproxy 1 --name 0.0.1 --num_procs 5 --
vpid_start 0 --nodename m45-037.pool --universe
apell...@m45-037.pool:default-universe-29473 --nsreplica
"0.0.0;tcp://10.11.45.37:55477" --gprreplica "0.0.0;tcp://
10.11.45.37:55477" --set-sid [OAR_JOBID=597856 HOST=m45-037.pool
TERM=xterm SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr GROUP=csestudents
USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/apellegr
SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr OAR_JOB_NAME=
OAR_NODE_FILE=/var/lib/oar/597856 OAR_RESOURCE_PROPERTIES_FILE=/var/
lib/oar/597856_resources MAIL=/var/mail/oar PATH=/n/poolfs/z/home/
apellegr/openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/bin:/
n/poolfs/z/home/apellegr/openssl/bin OAR_PROJECT_NAME=default
OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/apellegr HOME=/home/apellegr
SUDO_COMMAND=OAR SHLVL=2 OAR_FILE_NODES=/var/lib/oar/597856
OSTYPE=linux VENDOR=intel OAR_JOB_WALLTIME=2:0:0 MACHTYPE=i486
LOGNAME=apellegr OAR_NODEFILE=/var/lib/oar/597856 OAR_RESOURCE_FILE=/
var/lib/oar/597856 SUDO_GID=390 OAR_JOB_ID=597856 OAR_O_WORKDIR=/
home/apellegr _=/n/poolfs/z/home/apellegr/openmpi/bin/mpirun OLDPWD=/
home/apellegr/openmpi OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh
LD_LIBRARY_PATH=/n/poolfs/z/home/apellegr/openmpi/lib OMPI_MCA_seed=0]
[m45-037.pool:29473] pls:rsh: launching on node m45-038.pool
[m45-037.pool:29473] pls:rsh: m45-038.pool is a REMOTE node
[m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/bin/
oarsh m45-038.pool set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 --nodename
m45-038.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool
TERM=xterm SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr GROUP=csestudents
USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/apellegr
SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr OAR_JOB_NAME=
OAR_NODE_FILE=/var/lib/oar/597856 OAR_RESOURCE_PROPERTIES_FILE=/var/
lib/oar/597856_resources MAIL=/var/mail/oar PATH=/usr/local/sbin:/
usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/
z/home/apellegr/openmpi/bin:/n/poolfs/z/home/apellegr/openssl/bin
OAR_PROJECT_NAME=default OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/
apellegr HOME=/home/apellegr SUDO_COMMAND=OAR SHLVL=2
OAR_FILE_NODES=/var/lib/oar/597856 OSTYPE=linux VENDOR=intel
OAR_JOB_WALLTIME=2:0:0 MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/
var/lib/oar/597856 OAR_RESOURCE_FILE=/var/lib/oar/597856
SUDO_GID=390 OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/
poolfs/z/home/apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/
openmpi OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh OMPI_MCA_seed=0]
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 --nodename
m45-038.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477"'
[m45-037.pool:29473] pls:rsh: launching on node m45-039.pool
[m45-037.pool:29473] ERROR: A daemon on node m45-038.pool failed to
start as expected.
[m45-037.pool:29473] ERROR: There may be more information available
from
[m45-037.pool:29473] ERROR: the remote shell (see above).
[m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
status 2.
[m45-037.pool:29473] pls:rsh: m45-039.pool is a REMOTE node
[m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/bin/
oarsh m45-039.pool set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 --nodename
m45-039.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool
TERM=xterm SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr GROUP=csestudents
USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/apellegr
SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr OAR_JOB_NAME=
OAR_NODE_FILE=/var/lib/oar/597856 OAR_RESOURCE_PROPERTIES_FILE=/var/
lib/oar/597856_resources MAIL=/var/mail/oar PATH=/usr/local/sbin:/
usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/
z/home/apellegr/openmpi/bin:/n/poolfs/z/home/apellegr/openssl/bin
OAR_PROJECT_NAME=default OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/
apellegr HOME=/home/apellegr SUDO_COMMAND=OAR SHLVL=2
OAR_FILE_NODES=/var/lib/oar/597856 OSTYPE=linux VENDOR=intel
OAR_JOB_WALLTIME=2:0:0 MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/
var/lib/oar/597856 OAR_RESOURCE_FILE=/var/lib/oar/597856
SUDO_GID=390 OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/
poolfs/z/home/apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/
openmpi OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh OMPI_MCA_seed=0]
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 --nodename
m45-039.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477"'
[m45-037.pool:29473] pls:rsh: launching on node m45-040.pool
[m45-037.pool:29473] ERROR: A daemon on node m45-039.pool failed to
start as expected.
[m45-037.pool:29473] ERROR: There may be more information available
from
[m45-037.pool:29473] ERROR: the remote shell (see above).
[m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
status 2.
[m45-037.pool:29473] pls:rsh: m45-040.pool is a REMOTE node
[m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/bin/
oarsh m45-040.pool set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 --nodename
m45-040.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool
TERM=xterm SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr GROUP=csestudents
USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/apellegr
SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr OAR_JOB_NAME=
OAR_NODE_FILE=/var/lib/oar/597856 OAR_RESOURCE_PROPERTIES_FILE=/var/
lib/oar/597856_resources MAIL=/var/mail/oar PATH=/usr/local/sbin:/
usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/
z/home/apellegr/openmpi/bin:/n/poolfs/z/home/apellegr/openssl/bin
OAR_PROJECT_NAME=default OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/
apellegr HOME=/home/apellegr SUDO_COMMAND=OAR SHLVL=2
OAR_FILE_NODES=/var/lib/oar/597856 OSTYPE=linux VENDOR=intel
OAR_JOB_WALLTIME=2:0:0 MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/
var/lib/oar/597856 OAR_RESOURCE_FILE=/var/lib/oar/597856
SUDO_GID=390 OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/
poolfs/z/home/apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/
openmpi OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh OMPI_MCA_seed=0]
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 --nodename
m45-040.pool --universe apell...@m45-037.pool:default-universe-29473
--nsreplica "0.0.0;tcp://10.11.45.37:55477" --gprreplica
"0.0.0;tcp://10.11.45.37:55477"'
[m45-037.pool:29473] ERROR: A daemon on node m45-040.pool failed to
start as expected.
[m45-037.pool:29473] ERROR: There may be more information available
from
[m45-037.pool:29473] ERROR: the remote shell (see above).
[m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
status 2.
[m45-037.pool:29473] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 188
[m45-037.pool:29473] [0,0,0] ORTE_ERROR_LOG: Timeout in
file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1190
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
apellegr@m45-037:~$
Can anybody help me?
Thanks,
~Andrea
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users