Thanx for the info.  It turned out to be a problem with the software, and not 
an open-mpi issue.

Ted

--- On Sun, 2/1/09, Jeff Squyres <jsquy...@cisco.com> wrote:
From: Jeff Squyres <jsquy...@cisco.com>
Subject: Re: [OMPI users] Question about compatibility issues
To: ted...@wag.caltech.edu, "Open MPI Users" <us...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Sunday, February 1, 2009, 3:28 AM

On Jan 26, 2009, at 4:57 PM, Ted Yu wrote:

> I'm new to this group.  I'm trying to implement a parallel quantum
code called "Seqquest".
> I'm trying to figure out why there is an error in the implementation
of this code where there is an error:
> 
> This job has allocated 2 cpus
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:(nil)
> [0] func:/usr/lib64/openmpi/libopal.so.0 [0x393af21dc5]
> [1] func:/lib64/tls/libpthread.so.0 [0x393b80c4f0]
> [2]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x
[0x4f5cfd]
> [3]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(rhosave_+0x120)
[0x4f6a8a]
> [4]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(MAIN__+0xb710)
[0x431770]
> [5]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(main+0xe)
[0xa717ee]
> [6] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x393b11c3fb]
> [7]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(free+0x3a)
[0x425fca]
> *** End of error message ***
> ^@mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).
> 
> 
> Trying to debug this code, I noticed that the math library is an Intel
math library, but all of the codes including scalapack and blacs were compiled
using GNU compiler.  Will there be compatibility issues?


There *could* be.  Have you tried to compile everything with the GNU compiler?

You might also try to examine what exactly in free() is going bad -- are you
passing a bad address to free?  Can you run the code through a debugger and/or
examine corefiles?

--Jeff Squyres
Cisco Systems




      

Reply via email to