On Jan 12, 2010, at 16:57 , Eugene Loh wrote:

> Jeff Squyres wrote:
> 
>> It would be very strange for nanosleep to cause a problem for Open MPI -- it 
>> shouldn't interfere with any of Open MPI's mechanisms.  Double check that 
>> your my_barrier() function is actually working properly -- removing the 
>> nanosleep() shouldn't affect the correctness of your barrier.
>> 
> I read Gijsbert's e-mail differently.  Apparently, the issue is not MPI/OMPI 
> at all, but a hang inside nanosleep.
> 
>> On Dec 31, 2009, at 1:15 PM, Gijsbert Wiesenekker wrote:
>> 
>>> I only recently learned about the OMPI_MCA_mpi_yield_when_idle variable, I 
>>> still have to test if that is an alternative to my workaround.
>>>   
> mpi_yield_when_idle does not free the CPU up very much.  It still polls 
> fairly aggressively, and the yield() call doesn't really free the CPU up that 
> much.  It's a weak and probably ungratifying solution for your problem.
> 
>>> Meanwhile I seem to have found the cause of problem ...
>>> ... rather than OpenMPI being the problem, nanosleep is the culprit because 
>>> the call to it seems to hang.
>>>   
> So, "we" (OMPI community) are off the hook?  Problem is in nanosleep?  "We" 
> are relieved (or confused about what you're reporting)!
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Just to confirm: the problem is indeed not with OpenMPI (so the OMPI community 
is off the hook) but with nanosleep() on Fedora Core 12 (and has not been fixed 
yet in the current kernel/glibc).
Using MPI_Barrier with OMPI_MCA_mpi_yield_when_idle helps somewhat, but is not 
ideal, as it still uses a lot of CPU.  

Gijsbert


Reply via email to