Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-17 Thread Dave Love
Ralph Castain writes: >> On Nov 13, 2014, at 3:36 PM, Dave Love wrote: >> >> Ralph Castain writes: >> >> cn6050 16 par6.q@cn6050 >> cn6045 16 par6.q@cn6045 The above looks like the PE_HOSTFILE. So it should be 16 slots per node. >>> >>> Hey Reuti >>> >>> Is that the sta

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-14 Thread Ralph Castain
> On Nov 13, 2014, at 3:36 PM, Dave Love wrote: > > Ralph Castain writes: > > cn6050 16 par6.q@cn6050 > cn6045 16 par6.q@cn6045 >>> >>> The above looks like the PE_HOSTFILE. So it should be 16 slots per node. >> >> Hey Reuti >> >> Is that the standard PE_HOSTFILE format? I’m looki

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-13 Thread Dave Love
Ralph Castain writes: >> I think there's a problem with documentation at least not being >> explicit, and it would really help to have it clarified unless I'm >> missing some. > > Not quite sure I understand this comment - the problem is that we > aren’t correctly reading the allocation, as evide

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-13 Thread Dave Love
Ralph Castain writes: cn6050 16 par6.q@cn6050 cn6045 16 par6.q@cn6045 >> >> The above looks like the PE_HOSTFILE. So it should be 16 slots per node. > > Hey Reuti > > Is that the standard PE_HOSTFILE format? I’m looking at the ras/gridengine > module, and it looks like it is expecti

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 7:15 AM, Dave Love wrote: > > Ralph Castain writes: > >> You might also add the —display-allocation flag to mpirun so we can >> see what it thinks the allocation looks like. If there are only 16 >> slots on the node, it seems odd that OMPI would assign 32 procs to it >> u

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
"SLIM H.A." writes: > Dear Reuti and Ralph > > Below is the output of the run for openmpi 1.8.3 with this line > > mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe -np is redundant with tight integration unless you're using fewer than NSLOTS from SGE. > ompi_info | g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Reuti writes: >> If so, I’m wondering if that NULL he shows in there is the source of the >> trouble. The parser doesn’t look like it would handle that very well, though >> I’d need to test it. Is that NULL expected? Or is the NULL not really in the >> file? > > I must admit here: for me the f

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Ralph Castain writes: > You might also add the —display-allocation flag to mpirun so we can > see what it thinks the allocation looks like. If there are only 16 > slots on the node, it seems odd that OMPI would assign 32 procs to it > unless it thinks there is only 1 node in the job, and oversubs

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Reuti
0 Process rank: 0 >>>>>> Process OMPI jobid: [57374,1] App: 0 Process rank: 1 >>>>>> >>>>>> … >>>>>> Process OMPI jobid: [57374,1] App: 0 Process rank: 31 >>>>>> >>>>>> >>>>

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
: [57374,1] App: 0 Process rank: 31 >>>>> >>>>> >>>>> Also >>>>> ompi_info | grep grid >>>>> gives MCA ras: gridengine (MCA v2.0, API v2.0, Component >>>>> v1.8.3) >>>>> and >&g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Reuti
MCA ras: gridengine (MCA v2.0, API v2.0, Component >>>> v1.8.3) >>>> and >>>> ompi_info | grep psm >>>> gives MCA mtl: psm (MCA v2.0, API v2.0, Component v1.8.3) >>>> because the intercoonect is TrueScale/QLogic &g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
nent v1.8.3) >>> because the intercoonect is TrueScale/QLogic >>> >>> and >>> >>> setenv OMPI_MCA_mtl "psm" >>> >>> is set in the script. This is the PE >>> >>> pe_name orte >>> slots 4

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Reuti
t; pe_name orte >> slots 4000 >> user_listsNONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args/bin/true >> allocation_rule $fill_up >> control_slavesTRUE >> job_is_first_task FALSE &g

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Ralph Castain
4000 > user_listsNONE > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args/bin/true > allocation_rule $fill_up > control_slavesTRUE > job_is_first_task FALSE > urgency_slots min > > Many thanks > > Henk > >

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread Dave Love
"SLIM H.A." writes: > We switched on hyper threading on our cluster with two eight core > sockets per node (32 threads per node). Assuming that's Xeon-ish hyperthreading, the best advice is not to. It will typically hurt performance of HPC applications, not least if it defeats core binding, and

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-11 Thread SLIM H.A.
Behalf Of Ralph Castain Sent: 10 November 2014 05:07 To: Open MPI Users Subject: Re: [OMPI users] oversubscription of slots with GridEngine You might also add the —display-allocation flag to mpirun so we can see what it thinks the allocation looks like. If there are only 16 slots on the node, it

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-10 Thread Ralph Castain
You might also add the —display-allocation flag to mpirun so we can see what it thinks the allocation looks like. If there are only 16 slots on the node, it seems odd that OMPI would assign 32 procs to it unless it thinks there is only 1 node in the job, and oversubscription is allowed (which it

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-09 Thread Reuti
Hi, > Am 09.11.2014 um 18:20 schrieb SLIM H.A. : > > We switched on hyper threading on our cluster with two eight core sockets per > node (32 threads per node). > > We configured gridengine with 16 slots per node to allow the 16 extra > threads for kernel process use but this apparently does

[OMPI users] oversubscription of slots with GridEngine

2014-11-09 Thread SLIM H.A.
We switched on hyper threading on our cluster with two eight core sockets per node (32 threads per node). We configured gridengine with 16 slots per node to allow the 16 extra threads for kernel process use but this apparently does not work. Printout of the gridengine hostfile shows that for a