Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Ralph Castain
I would personally suggest not reconfiguring your system simply to support a particular version of OMPI. The only difference between the 1.4 and 1.5 series wrt slurm is that we changed a few things to support a more recent version of slurm. It is relatively easy to backport that code to the 1.4

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis
On 09/02/2011, at 9:16 AM, Ralph Castain wrote: > See below > > > On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote: > >> >> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote: >> >>> Hi Michael, >>> >>> You may have tried to send some debug information to the list, but it >>> appears to

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Ralph Castain
See below On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote: > > On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote: > >> Hi Michael, >> >> You may have tried to send some debug information to the list, but it >> appears to have been blocked. Compressed text output of the backtrace text >

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis
On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote: > Hi Michael, > > You may have tried to send some debug information to the list, but it appears > to have been blocked. Compressed text output of the backtrace text is > sufficient. Odd, I thought I sent it to you directly. In any case,

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis
On 09/02/2011, at 2:38 AM, Ralph Castain wrote: > Another possibility to check - are you sure you are getting the same OMPI > version on the backend nodes? When I see it work on local node, but fail > multi-node, the most common problem is that you are picking up a different > OMPI version due

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-08 Thread Joshua Hursey
There are a few reasons why this might be occurring. Did you build with the '--enable-ft-thread' option? If so, it looks like I didn't move over the thread_sleep_wait adjustment from the trunk - the thread was being a bit too aggressive. Try adding the following to your command line options, an

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Ralph Castain
Another possibility to check - are you sure you are getting the same OMPI version on the backend nodes? When I see it work on local node, but fail multi-node, the most common problem is that you are picking up a different OMPI version due to path differences on the backend nodes. On Feb 8, 201

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Samuel K. Gutierrez
Hi Michael, You may have tried to send some debug information to the list, but it appears to have been blocked. Compressed text output of the backtrace text is sufficient. Thanks, -- Samuel K. Gutierrez Los Alamos National Laboratory On Feb 7, 2011, at 8:38 AM, Samuel K. Gutierrez wrote:

[OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-08 Thread Nguyen Toan
Hi all, I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2). I found that when running an application,which uses MPI_Isend, MPI_Irecv and MPI_Wait, enabling C/R, i.e using "-am ft-enable-cr", the application runtime is much longer than the normal execution with mpirun (no checkpoint