Thanx for the info. It turned out to be a problem with the software, and not an open-mpi issue.
Ted --- On Sun, 2/1/09, Jeff Squyres <jsquy...@cisco.com> wrote: From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] Question about compatibility issues To: ted...@wag.caltech.edu, "Open MPI Users" <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date: Sunday, February 1, 2009, 3:28 AM On Jan 26, 2009, at 4:57 PM, Ted Yu wrote: > I'm new to this group. I'm trying to implement a parallel quantum code called "Seqquest". > I'm trying to figure out why there is an error in the implementation of this code where there is an error: > > This job has allocated 2 cpus > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:/usr/lib64/openmpi/libopal.so.0 [0x393af21dc5] > [1] func:/lib64/tls/libpthread.so.0 [0x393b80c4f0] > [2] func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x [0x4f5cfd] > [3] func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(rhosave_+0x120) [0x4f6a8a] > [4] func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(MAIN__+0xb710) [0x431770] > [5] func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(main+0xe) [0xa717ee] > [6] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x393b11c3fb] > [7] func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(free+0x3a) [0x425fca] > *** End of error message *** > ^@mpiexec: Warning: task 0 died with signal 11 (Segmentation fault). > > > Trying to debug this code, I noticed that the math library is an Intel math library, but all of the codes including scalapack and blacs were compiled using GNU compiler. Will there be compatibility issues? There *could* be. Have you tried to compile everything with the GNU compiler? You might also try to examine what exactly in free() is going bad -- are you passing a bad address to free? Can you run the code through a debugger and/or examine corefiles? --Jeff Squyres Cisco Systems