Am 30.11.2009 um 20:07 schrieb Ondrej Glembek:
I definitely compiled the package with --with-sge flag... Here's my
configure log:
./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64 --
with-sge --enable-shared --enable-static --host=x86_64-linux --
build=x86_64-linux NM=x86_64-linux-nm
Is there any list of valid values for --host, --build and NM - and
what is NM for? From the ./configure --help I would "assume" that one
can tell Open MPI to prepare to BUILD on a PPC platform, although I'm
issuing the command on a x86, and the result of the PPC compile
should be to run on x86_64. Maybe you can leave it out, as it's the
same in your case?
Just to mention one more interesting thing: when---by luck---sge
reserves the jobs on the same machine (aka smp scheme), all works
with no problem...
Then it will just create forks - no necessity to use qrsh at all.
Is there any way to force the ssh before the (...) term???
Using SSH directly would bypass SGE's startup. What are your entries
for qrsh_daemon and so on in SGE's configuration? Which version of SGE?
But I think the real problem is, that Open MPI assumes you are
outside of SGE and so uses a different startup. Are you resetting any
of SGE's environment variables in your custom starter method (like
$JOB_ID)?
-- Reuti
Thanx
Ondrej
Reuti wrote:
Am 30.11.2009 um 18:46 schrieb Ondrej Glembek:
Hi, thanx for reply...
I tried to dump the $@ before calling the exec and here it is:
( test ! -r ./.profile || . ./.profile; PATH=/homes/kazi/glembek/
share/openmpi-1.3.3-64/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/
homes/kazi/glembek/share/openmpi-1.3.3-64/lib:$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /homes/kazi/glembek/share/
openmpi-1.3.3-64/bin/orted -mca ess env -mca orte_ess_jobid
3870359552 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-
uri "3870359552.0;tcp://147.229.8.134:53727" --mca
pls_gridengine_verbose 1 --output-filename mpi.log )
It looks like the line gets constructed in orte/mca/plm/rsh/
plm_rsh_module.c and depends on the shell...
Still I wonder, why mpiexec calls the starter.sh... I thought the
starter was supposed to call the script which wraps a call to
mpiexec...
Correct. This will happen for the master node of this job, i.e.
where the jobscript is executed. But it will also be used for the
qrsh -inherit calls. I wonder about one thing: I see only a call
to "orted" and not the above sub-shell on my machines. Did you
compile Open MPI with --with-sge?
The original call above would be "ssh node_xy ( test ! ....)"
which seems working for ssh and rsh.
Just one note: with the starter script you will lose the set PATH
and LD_LIBRARY_PATH, as a new shell is created. It might be
necessary to set it again in your starter method.
-- Reuti
Am I not right???
Ondrej
Reuti wrote:
Hi,
Am 30.11.2009 um 16:33 schrieb Ondrej Glembek:
we are using a custom starter method in our SGE to launch our
jobs... It
looks something like this:
#!/bin/sh
# ... we do whole bunch of stuff here
#start the job in thus shell
exec "$@"
the "$@" should be replaced by the path to the jobscript (qsub)
or command (qrsh) plus the given options.
For the spread tasks to other nodes I get as argument: " orted -
mca ess env -mca orte_ess_jobid ...". Also no . ./.profile.
So I wonder, where the . ./.profile is coming from. Can you put
a `sleep 60` or alike before the `exec ...` and grep the built
line from `ps -e f` before it crashes?
-- Reuti
The trouble is that mpiexec passes a command which looks like
this:
( . ./.profile ..... )
which, however, is not a valid exec argument...
Is there any way to tell mpiexec to run it in a separate
script??? Any
idea how to solve this???
Thanx
Ondrej Glembek
--
Ondrej Glembek, PhD student E-mail: glem...@fit.vutbr.cz
UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/
~glembek
Bozetechova 2, 612 66 Phone: +420 54114-1292
Brno, Czech Republic Fax: +420 54114-1290
ICQ: 93233896
GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Ondrej Glembek, PhD student E-mail: glem...@fit.vutbr.cz
UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/
~glembek
Bozetechova 2, 612 66 Phone: +420 54114-1292
Brno, Czech Republic Fax: +420 54114-1290
ICQ: 93233896
GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Ondrej Glembek, PhD student E-mail: glem...@fit.vutbr.cz
UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/
~glembek
Bozetechova 2, 612 66 Phone: +420 54114-1292
Brno, Czech Republic Fax: +420 54114-1290
ICQ: 93233896
GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users