Pak Lui wrote:
Geoff Galitz wrote:
Hello,
On the following system:
OpenMPI 1.1.1
SGE 6.0 (with tight integration)
Scientific Linux 4.3
Dual Dual-Core Opterons
MPI jobs are oversubscribing to the nodes. No matter where jobs are
launched by the scheduler, they always stack up on the first node
(node00) and continue to stack even though the system load exceeds 6
(on a 4 processor box). Eeach node is defined as 4 slots with 4 max
slots. The MPI jobs launch via "mpirun -np (some-number-of-
processors)" from within the scheduler.
Hi Geoff,
I think we first start having SGE support in 1.2, not in 1.1.1. Unless
you did some modification on your own to include the gridengine ras/pls
modules from v1.2, you probably are not using the SGE tight integration.
I've experimented a bit with backporting the 1.2 gridengine tight
integration modules to 1.1* and it seems to work nicely.
If you're feeling adventurous here are some unofficial packages and
information related to this:
http://staff.csc.fi/~oplehto/openmpi-gridengine/
Olli-Pekka
--
Olli-Pekka Lehto, Systems Specialist, Systems Services, CSC
PO Box 405 02101 Espoo, Finland; tel +358 9 457 2215, fax +358 9 4572302
CSC is the Finnish IT Center for Science, www.csc.fi,
e-mail: olli-pekka.le...@csc.fi