I definitely compiled the package with --with-sge flag... Here's my configure log:

./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64 --with-sge --enable-shared --enable-static --host=x86_64-linux --build=x86_64-linux NM=x86_64-linux-nm

Just to mention one more interesting thing: when---by luck---sge reserves the jobs on the same machine (aka smp scheme), all works with no problem...

Is there any way to force the ssh before the (...) term???

Thanx
Ondrej


Reuti wrote:
Am 30.11.2009 um 18:46 schrieb Ondrej Glembek:

Hi, thanx for reply...

I tried to dump the $@ before calling the exec and here it is:


( test ! -r ./.profile || . ./.profile; PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /homes/kazi/glembek/share/openmpi-1.3.3-64/bin/orted -mca ess env -mca orte_ess_jobid 3870359552 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "3870359552.0;tcp://147.229.8.134:53727" --mca pls_gridengine_verbose 1 --output-filename mpi.log )


It looks like the line gets constructed in orte/mca/plm/rsh/plm_rsh_module.c and depends on the shell...

Still I wonder, why mpiexec calls the starter.sh... I thought the starter was supposed to call the script which wraps a call to mpiexec...

Correct. This will happen for the master node of this job, i.e. where the jobscript is executed. But it will also be used for the qrsh -inherit calls. I wonder about one thing: I see only a call to "orted" and not the above sub-shell on my machines. Did you compile Open MPI with --with-sge?

The original call above would be "ssh node_xy ( test ! ....)" which seems working for ssh and rsh.

Just one note: with the starter script you will lose the set PATH and LD_LIBRARY_PATH, as a new shell is created. It might be necessary to set it again in your starter method.

-- Reuti



Am I not right???
Ondrej


Reuti wrote:
Hi,
Am 30.11.2009 um 16:33 schrieb Ondrej Glembek:
we are using a custom starter method in our SGE to launch our jobs... It
looks something like this:

#!/bin/sh

# ... we do whole bunch of stuff here

#start the job in thus shell
exec "$@"
the "$@" should be replaced by the path to the jobscript (qsub) or command (qrsh) plus the given options. For the spread tasks to other nodes I get as argument: " orted -mca ess env -mca orte_ess_jobid ...". Also no . ./.profile. So I wonder, where the . ./.profile is coming from. Can you put a `sleep 60` or alike before the `exec ...` and grep the built line from `ps -e f` before it crashes?
-- Reuti
The trouble is that mpiexec passes a command which looks like this:

( . ./.profile ..... )

which, however, is not a valid exec argument...

Is there any way to tell mpiexec to run it in a separate script??? Any
idea how to solve this???

Thanx
Ondrej Glembek

--

  Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
  UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
  Bozetechova 2, 612 66        Phone:  +420 54114-1292
  Brno, Czech Republic         Fax:    +420 54114-1290

  ICQ: 93233896
  GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--

  Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
  UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
  Bozetechova 2, 612 66        Phone:  +420 54114-1292
  Brno, Czech Republic         Fax:    +420 54114-1290

  ICQ: 93233896
  GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--

  Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
  UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
  Bozetechova 2, 612 66        Phone:  +420 54114-1292
  Brno, Czech Republic         Fax:    +420 54114-1290

  ICQ: 93233896
  GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C

Reply via email to