Hi Ralph,

Thanks for the tip.  With the command

$ qsub -pe mpi 8 -binding linear:1 myScript.com

I get the output

[exec6:29172] System has detected external process binding to cores 0008
[exec6:29172] ras:gridengine: JOB_ID: 59282
[exec6:29172] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec6/active_jobs/59282.1/pe_hostfile
[exec6:29172] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec6:29172] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec6:29172] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec6:29172] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec6:29172] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec6:29172] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec6:29172] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1

Presumably that means that OMPI is detecting the external binding okay.  If so, 
then that confirms my problem as an issue with how GE sets the processor 
affinity -- essentially the controlling sge_shepherd process  on each physical 
exec node gets bound to the requested number of cores (in this case 1) 
resulting in any child process (ie the ompi parallel processes) being bound to 
the same core.  What we really need is for GE to set the binding on each 
execution node according to the number of parallel processes that will run 
there.  Not sure this is doable currently...

Cheers,

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






Reply via email to