Am 03.10.2008 um 10:46 schrieb Jaime Perea:

Hello again.

Since I already had a 6.1 version of the sge I reverted to it
and included the hacks (ssh, sshd -i and qlogin_wrap) and in
this way both the interactives qsh and qrsh and batch qsub
worked with openmpi.
For me this is a solution, but I'm still curious  of what it was
going on in 6.2. I will see if there exists a list like this for the
sge.

Sure there is, but we will meet again ;-)

http://gridengine.sunsource.net/maillist.html

It's the us...@gridengine.sunsource.net

-- Reuti



Thanks a lot.

--
Jaime Perea

El Jueves, 2 de Octubre de 2008, Rolf Vandevaart escribió:
On 10/02/08 11:18, Reuti wrote:
Am 02.10.2008 um 16:51 schrieb Jaime Perea:
Hi

builtin, do I have to change them to ssh and sshd as in sge 6.1?

I always used only rsh, as ssh doesn't provide a Tight Integration
with correct accounting (unless you compiled SGE with -tigth-ssh on
your own).

But it would be worth a try with either the rsh or ssh stuff, as the
builtin starter is a new feature of SGE 6.2.

-- Reuti

As was mentioned, SGE 6.2 has a new Integrated Job Starter so that rsh and ssh do not need to be used to start jobs on remote nodes. This is
the recommended way of starting as it is faster than ssh and more
scalable than rsh. And, you do not need to do any hacks for proper job
accounting like was needed for ssh.

Under the covers, Open MPI uses qrsh to start the MPI jobs on all the
nodes.

Not sure if that helps, but just wanted to mention that information.

Rolf

Thanks again

--
Jaime Perea

El Jueves, 2 de Octubre de 2008, Reuti escribió:
Am 02.10.2008 um 16:12 schrieb Jaime Perea:
Hi again, thanks for the answer

Actually I took the definition of the pe from the openmpi
webpage, in my case

qconf -sp orte
pe_name            orte
slots              24
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

Our sge is version 6.2 and openmpi was configured with
the --with-sge switch of course.

In SGE 6.2 two types of remote startup are implemented. Which one
are you using (builtin or the former settings for each command) in
the SGE configuration?

-- Reuti

Regards

--
Jaime Perea

El Jueves, 2 de Octubre de 2008, Reuti escribió:
Hi,

Am 02.10.2008 um 15:37 schrieb Jaime Perea:
Hello,

I am having some problems with a combination of openmpi+sge6.2

Currently I'm working with the 1.3a1r19666 openmpi release and
the

AFAIK, you have to enable SGE support in Open MPI 1.3 during its
compilation.

myrinet gm libraries (2.1.19) but the problem was the same with
the prior 1.3 version. In short, I'm able to send jobs to a que
via qrsh,
more or less this way,

qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming

It should also work without specifying the number of slots a
second time, i.e.:

qrsh -cwd -V -q para -pe orte 6 mpirun ctiming

ctiming is a small test program and in this way it works, but if
I try to
send the same task by using qsub on a script like this one

#!/bin/sh
#$ -pe orte 6

This PE has just /bin/true for start-/stop_proc_args?

#$ -q para
#$ -cwd
#
mpirun -np $NSLOTS  /model/jaime/ctiming

mpirun /model/jaime/ctiming

It fails with a message like this,
..............

error reading job context from "qlogin_starter"

qlogin_starter should of course only be started with a qlogin
command in SGE.

--------------------------------------------------------------- --
--- --
----
A daemon (pid 11207) died unexpectedly with status 1 while
attempting
to launch so we are aborting.

There may be more information reported by the environment (see
above).

This may be because the daemon was unable to find all the needed
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this
will automatically be forwarded to the remote nodes.

.............

I know that LD_LIBRARY_PATH is not the problem, since I checked
that all
the environment is present.... any idea?

For previous releases of the sge and openmpi I was able to do
them work
together with a few wrappers,

Which version of SGE are you using?

-- Reuti

but now the integration looks much better!
This happen only when sending openmpi jobs.

Thanks and all the best

---

           Jaime D. Perea Duarte. <jaime at iaa dot es>
             Linux registered user #10472

           Dep. Astrofisica Extragalactica.
           Instituto de Astrofisica de Andalucia (CSIC)
           Apdo. 3004, 18080 Granada, Spain.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to