[mdb-discuss] trace memory corruption

Steven Xie Tue, 04 Jul 2006 10:46:00 -0400


Frank Hofmann wrote:


> [ ... ]
>
>> The key of the bug is, p1 can't be used after delete, even p1 stll 
>> points to a accessible addr.
>> That's why I wondering is umem possiblely track down this kind bug.
>> It doesn't work for me.
>
>
> No memory allocator that re-uses previously-freed virtual addresses 
> can detect/track this.
>
> What umem/kmem debugging support does, though, is to give you the 
> buffer history. I.e. you _see_ who allocated/freed this piece of 
> memory. Which gives you a handle how to narrow down your search for 
> possible culprits.
>
> I have an actual kernel crashdump where exactly this situation has 
> occurred and kmem's buffer allocation/free history was the key to 
> finding the culprit/fixing the bug. If you wish to try, I'll make it 
> available, and comment on it as needed.

Thats would be great! I like to give a shot. Could you give me some hint 
of kmem. Thanks.

> Whether that works for you depends on the frequency of alloc/free for 
> this buffer. The history goes back only so and so far.
>
> Do we have the userland equivalent of "::kgrep" ? I'm too rarely 
> looking into application dumps ...
>
> FrankH.
>
>>
>>
>>
>>
>>
>>
>>
>>> 2. The memory corruption will not be detected immediately.
>>>
>>>
>>> 1.
>>>
>>> If I take the simplest possible case of your example.
>>> And add a printf to check one thing.
>>>
>>> #include <strings.h>
>>> #include <stdio.h>
>>>
>>> int main()
>>> {
>>>  char * p1= new char[8];
>>>  delete p1;
>>>  char * p2 =new char[8];
>>>
>>>  if (p2 == p1) printf("Oops! p1 == p2\n");
>>>
>>>  strcpy(p2, "56789");
>>>  strcpy(p1, "01234");  //Bug causes memory corruption (if p1 points 
>>> to invalid area!)
>>> }
>>>
>>> We see that p2 is p1 so by accident in this case there is no 
>>> corruption!
>>> :) oops!
>>>
>>> Moving the delete we do get corruption (and we do not see the Oops):
>>>
>>> int main()
>>> {
>>>  char * p1= new char[8];
>>>  // no corruption if delete here, delete p1;
>>>  char * p2 =new char[8];
>>>  delete p1;
>>>
>>>  if (p2 == p1) printf("Oops! p1 == p2\n");
>>>
>>>  strcpy(p2, "56789");
>>>  strcpy(p1, "01234");  //Bug  causes memory corruption
>>>
>>>  return 0;
>>> }
>>>
>>> Now running this looks normal. No coredump.
>>> Running with libumem:
>>> UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 
>>> ~/c/testcorrupt
>>> nothing.
>>> Hmmmmm. (see 2. below for why this is expected)
>>>
>>>
>>> Now put in a sleep before the return to allow us time to attach to 
>>> the process using mdb:
>>> #include <unistd.h>
>>>  sleep(70);
>>> Now using `mdb -p pid` and
>>>
>>> > ::umem_verify
>>>
>>> umem_alloc_16                      3d6c8 1 corrupt buffer
>>>
>>>
>>> > 3d6c8::umem_verify
>>> Summary for cache 'umem_alloc_16'
>>>   buffer 49fe0 (free) seems corrupted, at 0
>>>
>>>
>>> > 49fe0/10X
>>> 0x49fe0:        deadbeef        deadbeef        30313233        
>>> 3400beef feedface feedface        54780           f4ebb36e
>>>                 deadbeef        deadbeef
>>>
>>> By examining what is in the corrupt buffer you might be able to tell 
>>> where it came from.
>>>
>>> > ::umalog
>>>
>>> T-0.000000000  addr=49fe0  umem_alloc_16
>>>          libumem.so.1`umem_cache_free+0x4c
>>>          libumem.so.1`process_free+0x68
>>>          libumem.so.1`free+0x38
>>>          libstdc++.so.6.0.3`_ZdlPv+0x10
>>>          main+0x28
>>>          _start+0x5c
>>>
>>> T-0.000031250  addr=49fc0  umem_alloc_16
>>>          libumem.so.1`umem_cache_alloc+0x13c
>>>          libumem.so.1`umem_alloc+0x44
>>>          libumem.so.1`malloc+0x2c
>>>          libstdc++.so.6.0.3`_Znwj+0x1c
>>>          libstdc++.so.6.0.3`_Znaj+4
>>>          main+0x18
>>>          _start+0x5c
>>>
>>> T-0.000053750  addr=49fe0  umem_alloc_16
>>>          libumem.so.1`umem_cache_alloc+0x13c
>>>          libumem.so.1`umem_alloc+0x44
>>>          libumem.so.1`malloc+0x2c
>>>          libstdc++.so.6.0.3`_Znwj+0x1c
>>>          libstdc++.so.6.0.3`_Znaj+4
>>>          main+8
>>>          _start+0x5c
>>> >
>>>
>>> By grepping the umalog you can find where the buffer that was 
>>> corrupted was malloc or freed from.
>>>
>>>
>>> 2.
>>>
>>> Memory corruption is detected when a buffer with corrupted redzones 
>>> is freed.
>>> You can also attach to the process and run ::umem_verify and friends.
>>> When freed memory is used by another malloc maybe the corruption is 
>>> detected?
>>> Not sure. Didn't
>>>
>>> Of course you cannot validate all memory and look for corruption 
>>> after every malloc/free.
>>> This would make things very slow.
>>>
>>> This article:
>>> http://access1.sun.com/techarticles/libumem.html
>>> Describes how to use gcore to make the process (under libumem) dump 
>>> core and then
>>> run ::umem_verify and friends. Or attach to process while still 
>>> running but
>>> after the put in a big sleep at the end
>>>
>>>
>>>
>>> A final note.
>>>
>>> If I had other "stuff" instead of the sleep after the corruption.
>>> A new (which would use the corrupt buffer).
>>>
>>> "stuff":
>>>  char * p3= new char[8];
>>>  if (p3 == p1) printf("Yes. p1 == p3\n");
>>>
>>>  for(int i=0;i<100;i++);
>>>
>>>  return 0;
>>>
>>> oops silly me! I left whatever I was going to do with the for loop 
>>> undone.
>>> but irregardless of that. umem dumped core (it looked like it was on 
>>> that new):
>>>
>>> [jcoleman at slaine] ~/c/$ UMEM_DEBUG=default UMEM_LOGGING=transaction 
>>> LD_PRELOAD=libumem.so.1 ~/c/testcorrupt
>>> Abort (core dumped)
>>>
>>> [jcoleman at slaine] ~/c/$ mdb core
>>> Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
>>> > $c
>>> libc.so.1`_kill+8(1, 64, 65640000, 7efefeff, 81010100, ff00)
>>> libumem.so.1`umem_err_recoverable+0x74(ff360cac, fffffff7, ffffffff, 
>>> 3b288b8f, 4bfb0, 3d720)
>>> libumem.so.1`umem_error+0x49c(1, ff377010, 0, 49fe0, 3d780, 10)
>>> libumem.so.1`umem_cache_alloc_debug+0xf0(3d6c8, 49fe0, 0, ff356d8c, 
>>> 0, 0)
>>> libumem.so.1`umem_cache_alloc+0x208(49fe0, 0, 0, 0, 0, 0)
>>> libumem.so.1`umem_alloc+0x44(10, 0, 0, 0, 0, 0)
>>> libumem.so.1`malloc+0x2c(8, ffbff0d8, 0, 0, ffbff188, ff1bc000)
>>> libstdc++.so.6.0.3`_Znwj+0x1c(8, ffbff188, 0, 0, 0, ff19ff1c)
>>> libstdc++.so.6.0.3`_Znaj+4(8, 10950, 49fe8, 34000000, 3400, 49fc8)
>>> main+0x8c(1, ffbff2ac, ffbff2b4, 20bb0, 0, 0)
>>> _start+0x5c(0, 0, 0, 0, 0, 0)
>>>
>>> > ::umem_status
>>> Status:         ready and active
>>> Concurrency:    1
>>> Logs:           transaction=64k (inactive)
>>> Message buffer:
>>> umem allocator: buffer modified after being freed
>>> modification occurred at offset 0x8 (0xdeadbeefdeadbeef replaced by 
>>> 0x303132333400beef)
>>> buffer=49fe0  bufctl=54780  cache: umem_alloc_16
>>> previous transaction on buffer 49fe0:
>>> thread=1  time=T-6.992512911  slab=4bfb0  cache: umem_alloc_16
>>> libumem.so.1'umem_cache_free+0x4c
>>> libumem.so.1'?? (0xff353868)
>>> libumem.so.1'free+0x38
>>> libstdc++.so.6.0.3'_ZdlPv+0x10
>>> testcorrupt'main+0x28
>>> testcorrupt'_start+0x5c
>>> umem: heap corruption detected
>>> stack trace:
>>> libumem.so.1'?? (0xff3554c8)
>>> libumem.so.1'?? (0xff356508)
>>> libumem.so.1'umem_cache_alloc+0x208
>>> libumem.so.1'umem_alloc+0x44
>>> libumem.so.1'malloc+0x2c
>>> libstdc++.so.6.0.3'_Znwj+0x1c
>>> libstdc++.so.6.0.3'_Znaj+0x4
>>> testcorrupt'main+0x8c
>>> testcorrupt'_start+0x5c
>>>
>>> Hooray.
>>>
>>> Isn't that nice :)
>>>
>>> James.
>>>
>>>
>>> As a side-note I am using gnu g++ as a compiler. With -g3 for debug 
>>> info.
>>>
>>> .
>>>
>> _______________________________________________
>> mdb-discuss mailing list
>> mdb-discuss at opensolaris.org
>>
>
> ========================================================================== 
>
> No good can come from selling your freedom, not for all gold of the 
> world,
> for the value of this heavenly gift exceeds that of any fortune on earth.
> ========================================================================== 
>
>
> .
>

[mdb-discuss] trace memory corruption

Reply via email to