Hi,
Am 01.12.2009 um 10:00 schrieb Ondrej Glembek:
Reuti wrote:
./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
--with-sge --enable-shared --enable-static --host=x86_64-linux
--build=x86_64-linux NM=x86_64-linux-nm
Is there any list of valid values for --host, --build and NM - and
what
is NM for? From the ./configure --help I would "assume" that one can
tell Open MPI to prepare to BUILD on a PPC platform, although I'm
issuing the command on a x86, and the result of the PPC compile
should
be to run on x86_64. Maybe you can leave it out, as it's the same in
your case?
This is not the problem... We have both 32bit and 64bit machines
and the
problem occurs on both (i.e. omitting the --host --build, etc)...
Is there any way to force the ssh before the (...) term???
Using SSH directly would bypass SGE's startup. What are your
entries for
qrsh_daemon and so on in SGE's configuration? Which version of SGE?
qstat reports version number as "GE 6.2u4"... Below is qconf -sconf
dump.
But I think the real problem is, that Open MPI assumes you are
outside
of SGE and so uses a different startup. Are you resetting any of
SGE's
environment variables in your custom starter method (like $JOB_ID)?
I don't think that openmpi doesn't know about SGE when it calls the
starter.sh...
The starter.sh looks like this:
$$$
#!/bin/sh
ulimit -S -c 0
ulimit -S -t unlimited
what about setting this in the queue definition (the core size). The
runtime will be limited if you request -l s_rt=... in SGE (or define
a max in the queue definiton) besides h_rt.
#echo "$@" >>/pub/tmp/starter.log
#start the job in thus shell
exec "$@"
loglevel log_warning
loglevel log_info
will often give more info (not in this case, but in case of some
other issues).
-- Reuti