Dear Reuti and Ralph

Below is the output of the run for openmpi 1.8.3 with this line

mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe


master=cn6050
PE=orte
JOB_ID=2482923
Got 32 slots.
slots:
cn6050 16 par6.q@cn6050 <NULL>
cn6045 16 par6.q@cn6045 <NULL>
Tue Nov 11 12:37:37 GMT 2014

======================   ALLOCATED NODES   ======================
        cn6050: slots=16 max_slots=0 slots_inuse=0 state=UP
=================================================================
Data for JOB [57374,1] offset 0

========================   JOB MAP   ========================

Data for node: cn6050  Num slots: 16   Max slots: 0    Num procs: 32
        Process OMPI jobid: [57374,1] App: 0 Process rank: 0
        Process OMPI jobid: [57374,1] App: 0 Process rank: 1

…
        Process OMPI jobid: [57374,1] App: 0 Process rank: 31


Also
ompi_info | grep grid
gives                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.8.3)
and
ompi_info | grep psm
gives                 MCA mtl: psm (MCA v2.0, API v2.0, Component v1.8.3)
because the intercoonect is TrueScale/QLogic

and

setenv OMPI_MCA_mtl "psm"

is set in the script. This is the PE

pe_name           orte
slots             4000
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

Many thanks

Henk


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: 10 November 2014 05:07
To: Open MPI Users
Subject: Re: [OMPI users] oversubscription of slots with GridEngine

You might also add the —display-allocation flag to mpirun so we can see what it 
thinks the allocation looks like. If there are only 16 slots on the node, it 
seems odd that OMPI would assign 32 procs to it unless it thinks there is only 
1 node in the job, and oversubscription is allowed (which it won’t be by 
default if it read the GE allocation)


On Nov 9, 2014, at 9:56 AM, Reuti 
<re...@staff.uni-marburg.de<mailto:re...@staff.uni-marburg.de>> wrote:

Hi,


Am 09.11.2014 um 18:20 schrieb SLIM H.A. 
<h.a.s...@durham.ac.uk<mailto:h.a.s...@durham.ac.uk>>:

We switched on hyper threading on our cluster with two eight core sockets per 
node (32 threads per node).

We configured  gridengine with 16 slots per node to allow the 16 extra threads 
for kernel process use but this apparently does not work. Printout of the 
gridengine hostfile shows that for a 32 slots job, 16 slots are placed on each 
of two nodes as expected. Including the openmpi --display-map option shows that 
all 32 processes are incorrectly  placed on the head node.

You mean the master node of the parallel job I assume.


Here is part of the output

master=cn6083
PE=orte

What allocation rule was defined for this PE - "control_slave yes" is set?


JOB_ID=2481793
Got 32 slots.
slots:
cn6083 16 par6.q@cn6083<mailto:par6.q@cn6083> <NULL>
cn6085 16 par6.q@cn6085<mailto:par6.q@cn6085> <NULL>
Sun Nov  9 16:50:59 GMT 2014
Data for JOB [44767,1] offset 0

========================   JOB MAP   ========================

Data for node: cn6083  Num slots: 16   Max slots: 0    Num procs: 32
      Process OMPI jobid: [44767,1] App: 0 Process rank: 0
      Process OMPI jobid: [44767,1] App: 0 Process rank: 1
...
      Process OMPI jobid: [44767,1] App: 0 Process rank: 31

=============================================================

I found some related mailings about a new warning in 1.8.2 about 
oversubscription and  I tried a few options to avoid the use of the extra 
threads for MPI tasks by openmpi without success, e.g. variants of

--cpus-per-proc 1
--bind-to-core

and some others. Gridengine treats hw threads as cores==slots (?) but the 
content of $PE_HOSTFILE suggests it distributes the slots sensibly  so it seems 
there is an option for openmpi required to get 16 cores per node?

Was Open MPI configured with --with-sge?

-- Reuti


I tried both 1.8.2, 1.8.3 and also 1.6.5.

Thanks for some clarification that anyone can give.

Henk


_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25718.php
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25719.php

Reply via email to