Just to give my understanding of the problem:

Am 15.11.2010 um 19:57 schrieb Terry Dontje:

> On 11/15/2010 11:08 AM, Chris Jewell wrote:
>>> Sorry, I am still trying to grok all your email as what the problem you 
>>> are trying to solve. So is the issue is trying to have two jobs having 
>>> processes on the same node be able to bind there processes on different 
>>> resources. Like core 1 for the first job and core 2 and 3 for the 2nd job? 
>>> 
>>> --td 
>>> 
>> That's exactly it.  Each MPI process needs to be bound to 1 processor in a 
>> way that reflects GE's slot allocation scheme.
>> 
>> 
> I actually don't think that I got it.  So you give two cases:
> 
> Case 1:
> $ qsub -pe mpi 8 -binding pe linear:1 myScript.com
> 
> and my pe_hostfile looks like:
> 
> exec6.cluster.stats.local 2 
> batch.q@exec6.cluster.stats.local
>  0,1

Shouldn't here two cores be reserved for exec6 as it got two slots?


> exec1.cluster.stats.local 1 
> batch.q@exec1.cluster.stats.local
>  0,1
> exec7.cluster.stats.local 1 
> batch.q@exec7.cluster.stats.local
>  0,1
> exec5.cluster.stats.local 1 
> batch.q@exec5.cluster.stats.local
>  0,1
> exec4.cluster.stats.local 1 
> batch.q@exec4.cluster.stats.local
>  0,1
> exec3.cluster.stats.local 1 
> batch.q@exec3.cluster.stats.local
>  0,1
> exec2.cluster.stats.local 1 
> batch.q@exec2.cluster.stats.local
>  0,1
> 
> 
> Case 2:
> Notice that, because I have specified the -binding pe linear:1, each 
> execution node binds processes for the job_id to one core.  If I have 
> -binding pe linear:2, I get:
> 
> exec6.cluster.stats.local 2 
> batch.q@exec6.cluster.stats.local
>  0,1:0,2
> exec1.cluster.stats.local 1 
> batch.q@exec1.cluster.stats.local
>  0,1:0,2
> exec7.cluster.stats.local 1 
> batch.q@exec7.cluster.stats.local
>  0,1:0,2
> exec4.cluster.stats.local 1 
> batch.q@exec4.cluster.stats.local
>  0,1:0,2
> exec3.cluster.stats.local 1 
> batch.q@exec3.cluster.stats.local
>  0,1:0,2
> exec2.cluster.stats.local 1 
> batch.q@exec2.cluster.stats.local
>  0,1:0,2
> exec5.cluster.stats.local 1 
> batch.q@exec5.cluster.stats.local
>  0,1:0,2
> 
> Is your complaint really the fact that exec6 has been allocated two slots but 
> there seems to only be one slot worth of resources allocated

All are wrong except exec6. They should only get one core assigned.

-- Reuti


> to it (ie in case one exec6 only has 1 core and case 2 it has two where maybe 
> you'd expect 2 and 4 cores allocated respectively)?
> 
> -- 
> <Mail-Anhang.gif>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to