Hi,

Am 15.11.2010 um 17:06 schrieb Chris Jewell:

> Hi Ralph,
> 
> Thanks for the tip.  With the command
> 
> $ qsub -pe mpi 8 -binding linear:1 myScript.com
> 
> I get the output
> 
> [exec6:29172] System has detected external process binding to cores 0008
> [exec6:29172] ras:gridengine: JOB_ID: 59282
> [exec6:29172] ras:gridengine: PE_HOSTFILE: 
> /usr/sge/default/spool/exec6/active_jobs/59282.1/pe_hostfile
> [exec6:29172] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
> slots=2
> [exec6:29172] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> [exec6:29172] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> [exec6:29172] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> [exec6:29172] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> [exec6:29172] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> [exec6:29172] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
> slots=1
> 
> Presumably that means that OMPI is detecting the external binding okay.  If 
> so, then that confirms my problem as an issue with how GE sets the processor 
> affinity -- essentially the controlling sge_shepherd process  on each 
> physical exec node gets bound to the requested number of cores (in this case 
> 1) resulting in any child process (ie the ompi parallel processes) being 
> bound to the same core.  What we really need is for GE to set the binding on 
> each execution node according to the number of parallel processes that will 
> run there.  Not sure this is doable currently...

on SGE's side it could be the problem that local MPI processes on each slave 
node are threads and don't invoke an  additional `qrsh -inherit ...`. If you 
have only one MPI process per node it's working fine?

-- Reuti


> Cheers,
> 
> Chris
> 
> 
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to