You may also want to check out libumem's undocumented firewalling feature - http://blogs.sun.com/roller/page/peteh?entry=hidden_features_of_libumem_ firewalls.
This will allow the process to coredump right away when the corruption is triggered, at the expense of a lot of extra pages/memory being consumed. As far as I can tell, this is equivalent to MSFT light pageheap vs full pageheap. If in a dev env /w enough memory, this can make your detective work a whole lot easier, depending on the corruption. Ivan -----Original Message----- From: mdb-discuss-boun...@opensolaris.org [mailto:mdb-discuss-bounces at opensolaris.org] On Behalf Of James Coleman Sent: Wednesday, June 28, 2006 4:15 AM To: Xie,Zhong; mdb-discuss at opensolaris.org Subject: Re: [mdb-discuss] trace memory corruption Xie,Zhong wrote: > I wondering is it possible to track down memory corruption like below > using umem > > char * p1= new char[8]; > ... > delete p1; > char * p2 =new char[8]; > .... > strcpy(p2, "56789"); > strcpy(p1, "01234"); //Bug causes memory corruption > > p1 and p2 point to the "same size" memory. It's possible p1, p2 acctually point to the same address. In this case , can umem still be able to tracing down the memory corruption. It doesn't work for me. > gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) > Hello, With my limited experience with mdb and libumem they are able to detect memory corruption. I was curious about how this was handled so I poked at it a bit myself and here is what I have learned. I hope it is a little bit useful to you. Two things: 1. Your example above might not corrupt memory (if it is simplified). 2. The memory corruption will not be detected immediately. 1. If I take the simplest possible case of your example. And add a printf to check one thing. #include <strings.h> #include <stdio.h> int main() { char * p1= new char[8]; delete p1; char * p2 =new char[8]; if (p2 == p1) printf("Oops! p1 == p2\n"); strcpy(p2, "56789"); strcpy(p1, "01234"); //Bug causes memory corruption (if p1 points to invalid area!) } We see that p2 is p1 so by accident in this case there is no corruption! :) oops! Moving the delete we do get corruption (and we do not see the Oops): int main() { char * p1= new char[8]; // no corruption if delete here, delete p1; char * p2 =new char[8]; delete p1; if (p2 == p1) printf("Oops! p1 == p2\n"); strcpy(p2, "56789"); strcpy(p1, "01234"); //Bug causes memory corruption return 0; } Now running this looks normal. No coredump. Running with libumem: UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 ~/c/testcorrupt nothing. Hmmmmm. (see 2. below for why this is expected) Now put in a sleep before the return to allow us time to attach to the process using mdb: #include <unistd.h> sleep(70); Now using `mdb -p pid` and > ::umem_verify umem_alloc_16 3d6c8 1 corrupt buffer > 3d6c8::umem_verify Summary for cache 'umem_alloc_16' buffer 49fe0 (free) seems corrupted, at 0 > 49fe0/10X 0x49fe0: deadbeef deadbeef 30313233 3400beef feedface feedface 54780 f4ebb36e deadbeef deadbeef By examining what is in the corrupt buffer you might be able to tell where it came from. > ::umalog T-0.000000000 addr=49fe0 umem_alloc_16 libumem.so.1`umem_cache_free+0x4c libumem.so.1`process_free+0x68 libumem.so.1`free+0x38 libstdc++.so.6.0.3`_ZdlPv+0x10 main+0x28 _start+0x5c T-0.000031250 addr=49fc0 umem_alloc_16 libumem.so.1`umem_cache_alloc+0x13c libumem.so.1`umem_alloc+0x44 libumem.so.1`malloc+0x2c libstdc++.so.6.0.3`_Znwj+0x1c libstdc++.so.6.0.3`_Znaj+4 main+0x18 _start+0x5c T-0.000053750 addr=49fe0 umem_alloc_16 libumem.so.1`umem_cache_alloc+0x13c libumem.so.1`umem_alloc+0x44 libumem.so.1`malloc+0x2c libstdc++.so.6.0.3`_Znwj+0x1c libstdc++.so.6.0.3`_Znaj+4 main+8 _start+0x5c > By grepping the umalog you can find where the buffer that was corrupted was malloc or freed from. 2. Memory corruption is detected when a buffer with corrupted redzones is freed. You can also attach to the process and run ::umem_verify and friends. When freed memory is used by another malloc maybe the corruption is detected? Not sure. Didn't Of course you cannot validate all memory and look for corruption after every malloc/free. This would make things very slow. This article: http://access1.sun.com/techarticles/libumem.html Describes how to use gcore to make the process (under libumem) dump core and then run ::umem_verify and friends. Or attach to process while still running but after the put in a big sleep at the end A final note. If I had other "stuff" instead of the sleep after the corruption. A new (which would use the corrupt buffer). "stuff": char * p3= new char[8]; if (p3 == p1) printf("Yes. p1 == p3\n"); for(int i=0;i<100;i++); return 0; oops silly me! I left whatever I was going to do with the for loop undone. but irregardless of that. umem dumped core (it looked like it was on that new): [jcoleman at slaine] ~/c/$ UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 ~/c/testcorrupt Abort (core dumped) [jcoleman at slaine] ~/c/$ mdb core Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ] > $c libc.so.1`_kill+8(1, 64, 65640000, 7efefeff, 81010100, ff00) libumem.so.1`umem_err_recoverable+0x74(ff360cac, fffffff7, ffffffff, 3b288b8f, 4bfb0, 3d720) libumem.so.1`umem_error+0x49c(1, ff377010, 0, 49fe0, 3d780, 10) libumem.so.1`umem_cache_alloc_debug+0xf0(3d6c8, 49fe0, 0, ff356d8c, 0, 0) libumem.so.1`umem_cache_alloc+0x208(49fe0, 0, 0, 0, 0, 0) libumem.so.1`umem_alloc+0x44(10, 0, 0, 0, 0, 0) libumem.so.1`malloc+0x2c(8, ffbff0d8, 0, 0, ffbff188, ff1bc000) libstdc++.so.6.0.3`_Znwj+0x1c(8, ffbff188, 0, 0, 0, ff19ff1c) libstdc++.so.6.0.3`_Znaj+4(8, 10950, 49fe8, 34000000, 3400, 49fc8) main+0x8c(1, ffbff2ac, ffbff2b4, 20bb0, 0, 0) _start+0x5c(0, 0, 0, 0, 0, 0) > ::umem_status Status: ready and active Concurrency: 1 Logs: transaction=64k (inactive) Message buffer: umem allocator: buffer modified after being freed modification occurred at offset 0x8 (0xdeadbeefdeadbeef replaced by 0x303132333400beef) buffer=49fe0 bufctl=54780 cache: umem_alloc_16 previous transaction on buffer 49fe0: thread=1 time=T-6.992512911 slab=4bfb0 cache: umem_alloc_16 libumem.so.1'umem_cache_free+0x4c libumem.so.1'?? (0xff353868) libumem.so.1'free+0x38 libstdc++.so.6.0.3'_ZdlPv+0x10 testcorrupt'main+0x28 testcorrupt'_start+0x5c umem: heap corruption detected stack trace: libumem.so.1'?? (0xff3554c8) libumem.so.1'?? (0xff356508) libumem.so.1'umem_cache_alloc+0x208 libumem.so.1'umem_alloc+0x44 libumem.so.1'malloc+0x2c libstdc++.so.6.0.3'_Znwj+0x1c libstdc++.so.6.0.3'_Znaj+0x4 testcorrupt'main+0x8c testcorrupt'_start+0x5c Hooray. Isn't that nice :) James. As a side-note I am using gnu g++ as a compiler. With -g3 for debug info. _______________________________________________ mdb-discuss mailing list mdb-discuss at opensolaris.org