Hi,

Am 01.12.2009 um 10:00 schrieb Ondrej Glembek:

Reuti wrote:

./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
--with-sge --enable-shared --enable-static --host=x86_64-linux
--build=x86_64-linux NM=x86_64-linux-nm

Is there any list of valid values for --host, --build and NM - and what
is NM for? From the ./configure --help I would "assume" that one can
tell Open MPI to prepare to BUILD on a PPC platform, although I'm
issuing the command on a x86, and the result of the PPC compile should
be to run on x86_64. Maybe you can leave it out, as it's the same in
your case?

This is not the problem... We have both 32bit and 64bit machines and the
problem occurs on both (i.e. omitting the --host --build, etc)...


Is there any way to force the ssh before the (...) term???

Using SSH directly would bypass SGE's startup. What are your entries for
qrsh_daemon and so on in SGE's configuration? Which version of SGE?

qstat reports version number as "GE 6.2u4"... Below is qconf -sconf dump.


But I think the real problem is, that Open MPI assumes you are outside of SGE and so uses a different startup. Are you resetting any of SGE's
environment variables in your custom starter method (like $JOB_ID)?
I don't think that openmpi doesn't know about SGE when it calls the
starter.sh...


The starter.sh looks like this:

$$$
#!/bin/sh

ulimit -S -c 0
ulimit -S -t unlimited

what about setting this in the queue definition (the core size). The runtime will be limited if you request -l s_rt=... in SGE (or define a max in the queue definiton) besides h_rt.


#echo "$@" >>/pub/tmp/starter.log

#start the job in thus shell
exec "$@"


loglevel                     log_warning

loglevel  log_info

will often give more info (not in this case, but in case of some other issues).

-- Reuti

Reply via email to