Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Ralph Castain
On Mar 14, 2012, at 5:44 PM, Reuti wrote: > Am 14.03.2012 um 23:48 schrieb Joshua Baker-LePain: > >> On Wed, 14 Mar 2012 at 6:31pm, Reuti wrote >> >>> I just tested with two different queues on two machines and a small >>> mpihello and it is working as expected. >> >> At this point the narrat

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Reuti
Am 14.03.2012 um 23:48 schrieb Joshua Baker-LePain: > On Wed, 14 Mar 2012 at 6:31pm, Reuti wrote > >> I just tested with two different queues on two machines and a small mpihello >> and it is working as expected. > > At this point the narrative is getting very confused, even for me. So I > tr

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Ralph Castain
Something is very wrong - there can only be one orted on each node. Having two orteds on the same node for the same job guarantees that things will become confused and generally fail. I don't know enough SGE to advise you what's wrong with your job script, but it looks like OMPI thinks there ar

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Reuti
Am 14.03.2012 um 18:30 schrieb Joshua Baker-LePain: > On Wed, 14 Mar 2012 at 9:33am, Reuti wrote > >>> I can run as many threads as I like on a single system with no problems, >>> even if those threads are running at different nice levels. >> >> How do they get different nice levels - you renic

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain
On Wed, 14 Mar 2012 at 6:31pm, Reuti wrote I just tested with two different queues on two machines and a small mpihello and it is working as expected. At this point the narrative is getting very confused, even for me. So I tried to find a clear cut case where I can change one thing to flip

Re: [OMPI users] invalid write in opal_generic_simple_unpack

2012-03-14 Thread Jeffrey Squyres
On Mar 14, 2012, at 1:06 PM, Patrik Jonsson wrote: > I think I tracked it down, though. The problem was in the boost.mpi > [snip] Yuck! Glad you tracked it down. > I do have a more general question, though: Is there a good way to back > out the location of the request object if I stop deep in t

Re: [OMPI users] invalid write in opal_generic_simple_unpack

2012-03-14 Thread Patrik Jonsson
On Wed, Mar 14, 2012 at 3:43 PM, Jeffrey Squyres wrote: > On Mar 14, 2012, at 9:38 AM, Patrik Jonsson wrote: > >> I'm trying to track down a spurious segmentation fault that I'm >> getting with my MPI application. I tried using valgrind, and after >> suppressing the 25,000 errors in PMPI_Init_thre

Re: [OMPI users] invalid write in opal_generic_simple_unpack

2012-03-14 Thread Jeffrey Squyres
On Mar 14, 2012, at 9:38 AM, Patrik Jonsson wrote: > I'm trying to track down a spurious segmentation fault that I'm > getting with my MPI application. I tried using valgrind, and after > suppressing the 25,000 errors in PMPI_Init_thread and associated > Init/Finalize functions, I haven't looked

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Reuti
Am 14.03.2012 um 17:44 schrieb Ralph Castain: > Hi Reuti > > I appreciate your help on this thread - I confess I'm puzzled by it. As you > know, OMPI doesn't use SGE to launch the individual processes, nor does SGE > even know they exist. All SGE is used for is to launch the OMPI daemons > (or

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain
On Wed, 14 Mar 2012 at 9:33am, Reuti wrote I can run as many threads as I like on a single system with no problems, even if those threads are running at different nice levels. How do they get different nice levels - you renice them? I would assume that all start at the same of the parent. In

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Ralph Castain
Hi Reuti I appreciate your help on this thread - I confess I'm puzzled by it. As you know, OMPI doesn't use SGE to launch the individual processes, nor does SGE even know they exist. All SGE is used for is to launch the OMPI daemons (orteds). This is done as a single qrsh call, so won't all the

[OMPI users] invalid write in opal_generic_simple_unpack

2012-03-14 Thread Patrik Jonsson
Hi, I'm trying to track down a spurious segmentation fault that I'm getting with my MPI application. I tried using valgrind, and after suppressing the 25,000 errors in PMPI_Init_thread and associated Init/Finalize functions, I'm left with an uninitialized write in PMPI_Isend (which I saw is not un

Re: [OMPI users] AlltoallV (with some zero send count values)

2012-03-14 Thread Shamis, Pavel
> > Can anyone tell me whether it is legal to pass zero values for some of the > send count elements in an MPI_AlltoallV() call? I want to perform an > all-to-all operation but for performance reasons do not want to send data to > various ranks who don't need to receive any useful values. If it

Re: [OMPI users] MPI_TAG_UB printing zero with Intel Compiler

2012-03-14 Thread Timothy Stitt
Thanks guys for the advice. I had initially compiled my code with the PGI compiler which actually returned a large, non-zero, non-negative output, which is why I thought it was valid to use it directly in my code. I'll make sure to use the correct procedure now for querying that 'attribute'. Be

Re: [OMPI users] MPI_TAG_UB printing zero with Intel Compiler

2012-03-14 Thread Jeffrey Squyres
George is correct. Try this: [5:21] svbu-mpi:~/mpi % cat tag-ub.f90 program mpitag use mpi implicit none integer :: err integer (KIND=MPI_ADDRESS_KIND) :: my_tag_ub logical flag call MPI_INIT(err) call MPI_COMM_GET_ATTR(MPI_COMM_WORLD, MPI_TAG_UB, my_tag

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Reuti
Hi, Am 14.03.2012 um 04:02 schrieb Joshua Baker-LePain: > On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote > >> FWIW: I have a Centos6 system myself, and I have no problems running OMPI on >> it (1.4 or 1.5). I can try building it the same way you do and see what >> happens. > > I can run a