Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-29 Thread Gleb Natapov
On Thu, Feb 28, 2008 at 04:53:11PM -0500, George Bosilca wrote: > In this particular case, I don't think the solution is that obvious. If > you look at the stack in the original email, you will notice how we get > into this. The problem here, is that the FREE_LIST_WAIT is used to get a > fragmen

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-29 Thread John Markus Bjørndalen
George Bosilca wrote: [.] I don't think the root crashed. I guess that one of the other nodes crashed, the root got a bad socket (which is what the first error message seems to indicate), and get terminated. As the output is not synchronized between the nodes, one cannot rely on its ord

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-28 Thread George Bosilca
On Feb 28, 2008, at 2:45 PM, John Markus Bjørndalen wrote: Hi, and thanks for the feedback everyone. George Bosilca wrote: Brian is completely right. Here is a more detailed description of this problem. [] On the other side, I hope that not many users write such applications. This i

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-28 Thread George Bosilca
In this particular case, I don't think the solution is that obvious. If you look at the stack in the original email, you will notice how we get into this. The problem here, is that the FREE_LIST_WAIT is used to get a fragment to store an unexpected message. If this macro return NULL (in oth

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-28 Thread Christian Bell
On Thu, 28 Feb 2008, Gleb Natapov wrote: > The trick is to call progress only from functions that are called > directly by a user process. Never call progress from a callback functions. > The main offenders of this rule are calls to OMPI_FREE_LIST_WAIT(). They > should be changed to OMPI_FREE_LIST

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-28 Thread John Markus Bjørndalen
Hi, and thanks for the feedback everyone. George Bosilca wrote: Brian is completely right. Here is a more detailed description of this problem. [] On the other side, I hope that not many users write such applications. This is the best way to completely kill the performances of any MPI imp

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-28 Thread Gleb Natapov
On Wed, Feb 27, 2008 at 10:01:06AM -0600, Brian W. Barrett wrote: > The only solution to this problem is to suck it up and audit all the code > to eliminate calls to opal_progress() in situations where infinite > recursion can result. It's going to be long and painful, but there's no > quick

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-27 Thread George Bosilca
Brian is completely right. Here is a more detailed description of this problem. Upon receiving a fragment from the BTL (lower layer) we try to match it with an MPI request. If the match fails, then we get a fragment from the free_list (via the blocking call to FREE_LIST_WAIT) and copy the

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-27 Thread Jeff Squyres
Bummer; ok. On Feb 27, 2008, at 11:01 AM, Brian W. Barrett wrote: I played with this to fix some things in ORTE at one point, and it's a very dangerous slope -- you're essentially guaranteeing you have a deadlock case. Now instead of running off the stack, you'll deadlock. The issue is th

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-27 Thread Brian W. Barrett
I played with this to fix some things in ORTE at one point, and it's a very dangerous slope -- you're essentially guaranteeing you have a deadlock case. Now instead of running off the stack, you'll deadlock. The issue is that we call opal_progress to wait for something to happen deep in the bo

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-27 Thread Jeff Squyres
Gleb / George -- Is there an easy way for us to put a cap on max recusion down in opal_progress? Just put in a counter in opal_progress() such that if it exceeds some max value, return success without doing anything (if opal_progress_event_flag indicates that nothing *needs* to be done)?

[OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

2008-02-22 Thread John Markus Bjørndalen
Hi, I ran into a bug when running a few microbenchmarks for OpenMPI. I had thrown in Reduce and Gather for sanity checking, but OpenMPI crashed when running those operations. Usually, this would happen when I reached around 12-16 nodes. My current crash-test code looks like this (I've remove