Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote FWIW: I have a Centos6 system myself, and I have no problems running OMPI on it (1.4 or 1.5). I can try building it the same way you do and see what happens. I can run as many threads as I like on a single system with no problems, even if th

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 11:28pm, Gutierrez, Samuel K wrote Can you rebuild without the "--enable-mpi-threads" option and try again. I did and still got segfaults (although w/ slightly different backtraces). See the response I just sent to Ralph. -- Joshua Baker-LePain QB3 Shared Cluster Sysa

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 6:05pm, Ralph Castain wrote I started playing with this configure line on my Centos6 machine, and I'd suggest a couple of things: 1. drop the --with-libltdl=external ==> not a good idea 2. drop --with-esmtp ==> useless unless you really want pager messages notifying

Re: [OMPI users] MPI_TAG_UB printing zero with Intel Compiler

2012-03-13 Thread George Bosilca
MPI_TAG_UB is not a constant, it is a predefined attribute. As such it should be accessed using the attribute accessors (MPI_COMM_GET_ATTR page 229 in the MPI 2.2 standard). george. On Mar 13, 2012, at 13:51 , Timothy Stitt wrote: > Hi Jeff, > > I went through the procedure of compiling and

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gustavo Correa
Hi Joshua Can you your int counter "i" get so large? > for(i=0; i<=1; i++) I may be mistaken, but 1,000,000,000,000 = 10**12 > 2**31=2,147,483,647=maximum int. Unless they are 64-bit long[s]. Just a thought. Gus Correa On Mar 13, 2012, at 4:54 PM, Joshua Baker-LePain wrote:

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Ralph Castain
I started playing with this configure line on my Centos6 machine, and I'd suggest a couple of things: 1. drop the --with-libltdl=external ==> not a good idea 2. drop --with-esmtp ==> useless unless you really want pager messages notifying you of problems 3. drop --enable-mpi-threads for now

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Ralph Castain
Hmmm….you might try removing the -enable-mpi-threads from the configure to be safe. FWIW: I have a Centos6 system myself, and I have no problems running OMPI on it (1.4 or 1.5). I can try building it the same way you do and see what happens. On Mar 13, 2012, at 5:22 PM, Joshua Baker-LePain wro

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gutierrez, Samuel K
Can you rebuild without the "--enable-mpi-threads" option and try again. Thanks, Sam On Mar 13, 2012, at 5:22 PM, Joshua Baker-LePain wrote: > On Tue, 13 Mar 2012 at 10:57pm, Gutierrez, Samuel K wrote > >> Fooey. What compiler are you using to build Open MPI and how are you >> configuring yo

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 10:57pm, Gutierrez, Samuel K wrote Fooey. What compiler are you using to build Open MPI and how are you configuring your build? I'm using gcc as packaged by RH/CentOS 6.2: [jlb@opt200 1.4.5-2]$ gcc --version gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) I actually tried

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 5:06pm, Ralph Castain wrote Out of curiosity: could you send along the mpirun cmd line you are using to launch these jobs? I'm wondering if the SGE integration itself is the problem, and it only shows up in the sm code. It's about as simple as it gets: mpirun -np $NSLO

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Ralph Castain
Out of curiosity: could you send along the mpirun cmd line you are using to launch these jobs? I'm wondering if the SGE integration itself is the problem, and it only shows up in the sm code. On Mar 13, 2012, at 4:57 PM, Gutierrez, Samuel K wrote: > > On Mar 13, 2012, at 4:07 PM, Joshua Baker

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gutierrez, Samuel K
On Mar 13, 2012, at 4:07 PM, Joshua Baker-LePain wrote: > On Tue, 13 Mar 2012 at 9:15pm, Gutierrez, Samuel K wrote > Any more information surrounding your failures in 1.5.4 are greatly appreciated. >>> >>> I'm happy to provide, but what exactly are you looking for? The test code >>

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 9:15pm, Gutierrez, Samuel K wrote Any more information surrounding your failures in 1.5.4 are greatly appreciated. I'm happy to provide, but what exactly are you looking for? The test code I'm running is *very* simple: If you experience this type of failure with 1.4.

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gutierrez, Samuel K
On Mar 13, 2012, at 2:54 PM, Joshua Baker-LePain wrote: > On Tue, 13 Mar 2012 at 7:53pm, Gutierrez, Samuel K wrote > >> The failure signature isn't exactly what we were seeing here at LANL, but >> there were misplaced memory barriers in Open MPI 1.4.3. Ticket 2619 talks >> about this issue (h

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 7:53pm, Gutierrez, Samuel K wrote The failure signature isn't exactly what we were seeing here at LANL, but there were misplaced memory barriers in Open MPI 1.4.3. Ticket 2619 talks about this issue (https://svn.open-mpi.org/trac/ompi/ticket/2619). This doesn't explain,

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gutierrez, Samuel K
The failure signature isn't exactly what we were seeing here at LANL, but there were misplaced memory barriers in Open MPI 1.4.3. Ticket 2619 talks about this issue (https://svn.open-mpi.org/trac/ompi/ticket/2619). This doesn't explain, however, the failures that you are experiencing within Op

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 7:20pm, Gutierrez, Samuel K wrote Just to be clear, what specific version of Open MPI produced the provided backtrace? This smells like a missing memory barrier problem. The backtrace in my original post was from 1.5.4 -- I took the 1.5.4 source and put it into the 1.5

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Gutierrez, Samuel K
Hi, Just to be clear, what specific version of Open MPI produced the provided backtrace? This smells like a missing memory barrier problem. -- Samuel K. Gutierrez Los Alamos National Laboratory On Mar 13, 2012, at 1:07 PM, Joshua Baker-LePain wrote: > I run a decent size (600+ nodes, 4000+ co

[OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
I run a decent size (600+ nodes, 4000+ cores) heterogeneous (multiple generations of x86_64 hardware) cluster. We use SGE (currently 6.1u4, which, yes, is pretty ancient) and just upgraded from CentOS 5.7 to 6.2. We had been using MPICH2 under CentOS 5, but I'd much rather use OpenMPI as packa

Re: [OMPI users] MPI_TAG_UB printing zero with Intel Compiler

2012-03-13 Thread Timothy Stitt
Hi Jeff, I went through the procedure of compiling and running, then copied the procedure verbatim from the command line (see below). [tstitt@memtfe] /pscratch/tstitt > more mpitag.f90 program mpitag use mpi implicit none integer :: err call MPI_INIT(err) print *,MPI_TAG_

Re: [OMPI users] MPI_TAG_UB printing zero with Intel Compiler

2012-03-13 Thread Jeffrey Squyres
Tim -- I am unable to replicate this problem with a 1.4 build with icc. Can you share your test code? On Mar 10, 2012, at 7:30 PM, Timothy Stitt wrote: > Hi all, > > I was experimenting with MPI_TAG_UB in my code recently and found that its > value is set to 0 in my v1.4.3 and v1.4.5 builds

Re: [OMPI users] MPI_Testsome with incount=0, NULL array_of_indices and array_of_statuses causes MPI_ERR_ARG

2012-03-13 Thread Jeffrey Squyres
On Mar 9, 2012, at 5:17 PM, Jeremiah Willcock wrote: > On Open MPI 1.5.1, when I call MPI_Testsome with incount=0 and the two output > arrays NULL, I get an argument error (MPI_ERR_ARG). Is this the intended > behavior? If incount=0, no requests can complete, so the output arrays can > never