Jezz i really cant read this morning, you are using torque and the
mpiexec is the one with openmpi. I cant help you then someone else
is going to have to. Sorry
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jun 15, 2006, at 9:42 AM, Martin Schafföner wrote:
Hi,
I have been trying to set up OpenMPI 1.0.3a1r10374 on our cluster
and was
partly successful. Partly, because installation worked, compiling a
simple
example and running it through the rsh pls also worked. However,
I'm the only
user who has rsh access to the nodes, all other users must go
through torque
and launch mpi apps using torque's TM subsystem. That's where my
problem
starts: I was not successful in launching apps through TM. TM pls is
configured okay, I can see it making connections to torque mom in
mom's
logfile; however, the app never gets run. Even if I only request one
processor, mpiexec spawns several orted in a row. Here is my
session log
(where I kill mpiexec using CTRL-C cause it would otherwise run
forever):
schaffoe@node16:~/tmp/mpitest> mpiexec -np 1 --mca pls_tm_debug 1 --
mca pls tm
`pwd`/openmpitest
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
--num_procs 2 --vpid_start 0 --nodename --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: found /opt/openmpi/bin/orted
[node16:03113] pls:tm: not oversubscribed -- setting
mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
1 --name
0.0.1 --num_procs 2 --vpid_start 0 --nodename node16 --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
--num_procs 3 --vpid_start 0 --nodename --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting
mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
1 --name
0.0.2 --num_procs 3 --vpid_start 0 --nodename node16 --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
--num_procs 4 --vpid_start 0 --nodename --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting
mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
1 --name
0.0.3 --num_procs 4 --vpid_start 0 --nodename node16 --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
mpiexec: killing job...
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
--num_procs 5 --vpid_start 0 --nodename --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting
mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
1 --name
0.0.4 --num_procs 5 --vpid_start 0 --nodename node16 --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: final top-level argv:
[node16:03113] pls:tm: orted --no-daemonize --bootproxy 1 --name
--num_procs 6 --vpid_start 0 --nodename --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
[node16:03113] pls:tm: launching on node node16
[node16:03113] pls:tm: not oversubscribed -- setting
mpi_yield_when_idle to 0
[node16:03113] pls:tm: executing: orted --no-daemonize --bootproxy
1 --name
0.0.5 --num_procs 6 --vpid_start 0 --nodename node16 --universe
schaffoe@node16:default-universe-3113 --nsreplica
"0.0.0;tcp://192.168.1.16:60601" --gprreplica
"0.0.0;tcp://192.168.1.16:60601"
----------------------------------------------------------------------
----
WARNING: mpiexec encountered an abnormal exit.
This means that mpiexec exited before it received notification that
all
started processes had terminated. You should double check and ensure
that there are no runaway processes still executing.
----------------------------------------------------------------------
----
I read in the README that TM pls is working, whereas in the latex
usersguide
it says that only rsh and bproc are supported. I am confused...
Can anybody shed a better light on this?
Regards,
--
Martin Schafföner
Cognitive Systems Group, Institute of Electronics, Signal
Processing and
Communication Technologies, Department of Electrical Engineering,
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users