Re: [OMPI users] OpenMPI problem on Fedora Core 12

2010-03-07 Thread Gijsbert Wiesenekker
On Jan 12, 2010, at 16:57 , Eugene Loh wrote: > Jeff Squyres wrote: > >> It would be very strange for nanosleep to cause a problem for Open MPI -- it >> shouldn't interfere with any of Open MPI's mechanisms. Double check that >> your my_barrier() function is actually working properly -- remov

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2010-01-12 Thread Eugene Loh
Jeff Squyres wrote: It would be very strange for nanosleep to cause a problem for Open MPI -- it shouldn't interfere with any of Open MPI's mechanisms. Double check that your my_barrier() function is actually working properly -- removing the nanosleep() shouldn't affect the correctness of yo

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2010-01-12 Thread Jeff Squyres
It would be very strange for nanosleep to cause a problem for Open MPI -- it shouldn't interfere with any of Open MPI's mechanisms. Double check that your my_barrier() function is actually working properly -- removing the nanosleep() shouldn't affect the correctness of your barrier. If you'v

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2009-12-31 Thread Gijsbert Wiesenekker
First of all, the reason that I have created a CPU-friendly version of MPI_Barrier is that my program is asymmetric (so some of the nodes can easily have to wait for several hours) and that it is I/O bound. My program uses MPI mainly to synchronize I/O and to share some counters between the nodes,

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2009-12-14 Thread Ashley Pittman
On Sun, 2009-12-13 at 19:04 +0100, Gijsbert Wiesenekker wrote: > The following routine gives a problem after some (not reproducible) > time on Fedora Core 12. The routine is a CPU usage friendly version of > MPI_Barrier. There are some proposals for Non-blocking collectives before the MPI forum cu

Re: [OMPI users] OpenMPI problem on Fedora Core 12

2009-12-14 Thread Eugene Loh
Let's start with this: You generate non-blocking sends (MPI_Isend). Those sends are not completed anywhere. So, strictly speaking, they don't need to be executed. In practice, even if they are executed, they should be "completed" from the user program's point of view (MPI_Test, MPI_Wait, MP

[OMPI users] OpenMPI problem on Fedora Core 12

2009-12-13 Thread Gijsbert Wiesenekker
The following routine gives a problem after some (not reproducible) time on Fedora Core 12. The routine is a CPU usage friendly version of MPI_Barrier. The verbose output shows that if the problem occurs one of the (not reproducible) nodes does not receive a message from one of the other (not re