Jeff Squyres schrieb:
> On Oct 31, 2007, at 1:18 AM, Murat Knecht wrote:
>
>   
>> Yes I am, (master and child 1 running on the same machine).
>> But knowing the oversubscribing issue, I am using  
>> mpi_yield_when_idle which should fix precisely this problem, right?
>>     
>
> It won't *fix* the problem -- you're still oversubscribing the nodes,  
> so things will run slowly.  But it should help, in that the processes  
> will yield regularly.
>   
Yes. I meant "solving the blocking problem by letting others get some
CPU time" by "fix".

> What version of OMPI are you using?
>   
I am using 1.2.4

>> I did give both machines multiple slots, so OpenMPI
>> "knows" that the possibility for more oversubscription may arise.
>>     
>
> I'm not sure what you mean by this -- you should not "lie" to OMPI  
> and tell it that it has more slots than it physically does.  But keep  
> in mind that, as I described in my first mail, OMPI does not  
> currently re-compute the number of processes on a host as you spawn  
> (which can lead to the oversubscription problem).  If you're  
> explicitly setting yield_when_idle, that *may* help, but we may or  
> may not be explicitly propoagating that value to spawned  
> processes...  I'll have to check.
>   
In the hostfile I specified for each host the number of physically
available cores. Together with the "yield" setting I hoped the
oversubscription would be recognised even if the "oversubscribing"
processes are dynamically started.
I re-checked the high/low parameter, but it does seem alright. Would be
kind of awkward for this to be the reason, as the problem seems to
depend on the host and the order.

Thanks,
Murat

Reply via email to