If you run into a segv in this code, it almost certainly means that you have heap corruption somewhere. FWIW, that has *always* been what it meant when I've run into segv's in any code under in opal/mca/memory/linux/. Meaning: my user code did something wrong, it created heap corruption, and then later some malloc() or free() caused a segv in this area of the code.
This code is the same ptmalloc memory allocator that has shipped in glibc for years. I'll be hard-pressed to say that any code is 100% bug free :-), but I'd be surprised if there is a bug in this particular chunk of code. I'd run your code through valgrind or some other memory-checking debugger and see if that can shed any light on what's going on. On Sep 6, 2012, at 12:06 AM, Yong Qin wrote: > Hi, > > While debugging a mysterious crash of a code, I was able to trace down > to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in > opal/mca/memory/linux/malloc.c. Please see the following gdb log. > > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000) > at malloc.c:4385 > 4385 nextsize = chunksize(nextchunk); > (gdb) l > 4380 Consolidate other non-mmapped chunks as they arrive. > 4381 */ > 4382 > 4383 else if (!chunk_is_mmapped(p)) { > 4384 nextchunk = chunk_at_offset(p, size); > 4385 nextsize = chunksize(nextchunk); > 4386 assert(nextsize > 0); > 4387 > 4388 /* consolidate backward */ > 4389 if (!prev_inuse(p)) { > (gdb) bt > #0 opal_memory_ptmalloc2_int_free (av=0x2fd0637, > mem=0x203a746f74512000) at malloc.c:4385 > #1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637) > at malloc.c:3511 > #2 0x00002ae6b18ea736 in opal_memory_linux_free_hook > (__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705 > #3 0x0000000001412fcc in for_dealloc_allocatable () > #4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647, > name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0 > ) at alloc.F90:1357 > #5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5, > na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff, > lasto=..., iphorb=..., > numd=..., listdptr=..., listd=..., numh=..., listhptr=..., > listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0, > fa=..., stress=..., h=..., > first=@0x0, last=@0x0) at ldau.F:752 > #6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian > (first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199 > #7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces > (istep=@0xf9a4d07000000000) at siesta_forces.F:90 > #8 0x000000000070e475 in siesta () at siesta.F:23 > #9 0x000000000045e47c in main () > > Can anybody shed some light here on what could be wrong? > > Thanks, > > Yong Qin > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/