[mdb-discuss] trace memory corruption

Frank Hofmann Thu, 29 Jun 2006 19:57:05 +0200 (MEST)

[ ... ]
> The key of the bug is, p1 can't be used after delete, even p1 stll points to 
> a accessible addr.
> That's why I wondering is umem possiblely track down this kind bug.
> It doesn't work for me.


No memory allocator that re-uses previously-freed virtual addresses can 
detect/track this.

What umem/kmem debugging support does, though, is to give you the buffer 
history. I.e. you _see_ who allocated/freed this piece of memory. Which 
gives you a handle how to narrow down your search for possible culprits.

I have an actual kernel crashdump where exactly this situation has 
occurred and kmem's buffer allocation/free history was the key to finding 
the culprit/fixing the bug. If you wish to try, I'll make it available, 
and comment on it as needed.
Whether that works for you depends on the frequency of alloc/free for this 
buffer. The history goes back only so and so far.

Do we have the userland equivalent of "::kgrep" ? I'm too rarely looking 
into application dumps ...

FrankH.

>
>
>
>
>
>
>
>> 2. The memory corruption will not be detected immediately.
>> 
>> 
>> 1.
>> 
>> If I take the simplest possible case of your example.
>> And add a printf to check one thing.
>> 
>> #include <strings.h>
>> #include <stdio.h>
>> 
>> int main()
>> {
>>  char * p1= new char[8];
>>  delete p1;
>>  char * p2 =new char[8];
>> 
>>  if (p2 == p1) printf("Oops! p1 == p2\n");
>> 
>>  strcpy(p2, "56789");
>>  strcpy(p1, "01234");  //Bug causes memory corruption (if p1 points to 
>> invalid area!)
>> }
>> 
>> We see that p2 is p1 so by accident in this case there is no corruption!
>> :) oops!
>> 
>> Moving the delete we do get corruption (and we do not see the Oops):
>> 
>> int main()
>> {
>>  char * p1= new char[8];
>>  // no corruption if delete here, delete p1;
>>  char * p2 =new char[8];
>>  delete p1;
>> 
>>  if (p2 == p1) printf("Oops! p1 == p2\n");
>> 
>>  strcpy(p2, "56789");
>>  strcpy(p1, "01234");  //Bug  causes memory corruption
>> 
>>  return 0;
>> }
>> 
>> Now running this looks normal. No coredump.
>> Running with libumem:
>> UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 
>> ~/c/testcorrupt
>> nothing.
>> Hmmmmm. (see 2. below for why this is expected)
>> 
>> 
>> Now put in a sleep before the return to allow us time to attach to the 
>> process using mdb:
>> #include <unistd.h>
>>  sleep(70);
>> Now using `mdb -p pid` and
>> 
>> > ::umem_verify
>> 
>> umem_alloc_16                      3d6c8 1 corrupt buffer
>> 
>> 
>> > 3d6c8::umem_verify
>> Summary for cache 'umem_alloc_16'
>>   buffer 49fe0 (free) seems corrupted, at 0
>> 
>> 
>> > 49fe0/10X
>> 0x49fe0:        deadbeef        deadbeef        30313233        3400beef 
>> feedface feedface        54780           f4ebb36e
>>                 deadbeef        deadbeef
>> 
>> By examining what is in the corrupt buffer you might be able to tell where 
>> it came from.
>> 
>> > ::umalog
>> 
>> T-0.000000000  addr=49fe0  umem_alloc_16
>>          libumem.so.1`umem_cache_free+0x4c
>>          libumem.so.1`process_free+0x68
>>          libumem.so.1`free+0x38
>>          libstdc++.so.6.0.3`_ZdlPv+0x10
>>          main+0x28
>>          _start+0x5c
>> 
>> T-0.000031250  addr=49fc0  umem_alloc_16
>>          libumem.so.1`umem_cache_alloc+0x13c
>>          libumem.so.1`umem_alloc+0x44
>>          libumem.so.1`malloc+0x2c
>>          libstdc++.so.6.0.3`_Znwj+0x1c
>>          libstdc++.so.6.0.3`_Znaj+4
>>          main+0x18
>>          _start+0x5c
>> 
>> T-0.000053750  addr=49fe0  umem_alloc_16
>>          libumem.so.1`umem_cache_alloc+0x13c
>>          libumem.so.1`umem_alloc+0x44
>>          libumem.so.1`malloc+0x2c
>>          libstdc++.so.6.0.3`_Znwj+0x1c
>>          libstdc++.so.6.0.3`_Znaj+4
>>          main+8
>>          _start+0x5c
>> >
>> 
>> By grepping the umalog you can find where the buffer that was corrupted was 
>> malloc or freed from.
>> 
>> 
>> 2.
>> 
>> Memory corruption is detected when a buffer with corrupted redzones is 
>> freed.
>> You can also attach to the process and run ::umem_verify and friends.
>> When freed memory is used by another malloc maybe the corruption is 
>> detected?
>> Not sure. Didn't
>> 
>> Of course you cannot validate all memory and look for corruption after 
>> every malloc/free.
>> This would make things very slow.
>> 
>> This article:
>> http://access1.sun.com/techarticles/libumem.html
>> Describes how to use gcore to make the process (under libumem) dump core 
>> and then
>> run ::umem_verify and friends. Or attach to process while still running but
>> after the put in a big sleep at the end
>> 
>> 
>> 
>> A final note.
>> 
>> If I had other "stuff" instead of the sleep after the corruption.
>> A new (which would use the corrupt buffer).
>> 
>> "stuff":
>>  char * p3= new char[8];
>>  if (p3 == p1) printf("Yes. p1 == p3\n");
>> 
>>  for(int i=0;i<100;i++);
>> 
>>  return 0;
>> 
>> oops silly me! I left whatever I was going to do with the for loop undone.
>> but irregardless of that. umem dumped core (it looked like it was on that 
>> new):
>> 
>> [jcoleman at slaine] ~/c/$ UMEM_DEBUG=default UMEM_LOGGING=transaction 
>> LD_PRELOAD=libumem.so.1 ~/c/testcorrupt
>> Abort (core dumped)
>> 
>> [jcoleman at slaine] ~/c/$ mdb core
>> Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
>> > $c
>> libc.so.1`_kill+8(1, 64, 65640000, 7efefeff, 81010100, ff00)
>> libumem.so.1`umem_err_recoverable+0x74(ff360cac, fffffff7, ffffffff, 
>> 3b288b8f, 4bfb0, 3d720)
>> libumem.so.1`umem_error+0x49c(1, ff377010, 0, 49fe0, 3d780, 10)
>> libumem.so.1`umem_cache_alloc_debug+0xf0(3d6c8, 49fe0, 0, ff356d8c, 0, 0)
>> libumem.so.1`umem_cache_alloc+0x208(49fe0, 0, 0, 0, 0, 0)
>> libumem.so.1`umem_alloc+0x44(10, 0, 0, 0, 0, 0)
>> libumem.so.1`malloc+0x2c(8, ffbff0d8, 0, 0, ffbff188, ff1bc000)
>> libstdc++.so.6.0.3`_Znwj+0x1c(8, ffbff188, 0, 0, 0, ff19ff1c)
>> libstdc++.so.6.0.3`_Znaj+4(8, 10950, 49fe8, 34000000, 3400, 49fc8)
>> main+0x8c(1, ffbff2ac, ffbff2b4, 20bb0, 0, 0)
>> _start+0x5c(0, 0, 0, 0, 0, 0)
>> 
>> > ::umem_status
>> Status:         ready and active
>> Concurrency:    1
>> Logs:           transaction=64k (inactive)
>> Message buffer:
>> umem allocator: buffer modified after being freed
>> modification occurred at offset 0x8 (0xdeadbeefdeadbeef replaced by 
>> 0x303132333400beef)
>> buffer=49fe0  bufctl=54780  cache: umem_alloc_16
>> previous transaction on buffer 49fe0:
>> thread=1  time=T-6.992512911  slab=4bfb0  cache: umem_alloc_16
>> libumem.so.1'umem_cache_free+0x4c
>> libumem.so.1'?? (0xff353868)
>> libumem.so.1'free+0x38
>> libstdc++.so.6.0.3'_ZdlPv+0x10
>> testcorrupt'main+0x28
>> testcorrupt'_start+0x5c
>> umem: heap corruption detected
>> stack trace:
>> libumem.so.1'?? (0xff3554c8)
>> libumem.so.1'?? (0xff356508)
>> libumem.so.1'umem_cache_alloc+0x208
>> libumem.so.1'umem_alloc+0x44
>> libumem.so.1'malloc+0x2c
>> libstdc++.so.6.0.3'_Znwj+0x1c
>> libstdc++.so.6.0.3'_Znaj+0x4
>> testcorrupt'main+0x8c
>> testcorrupt'_start+0x5c
>> 
>> Hooray.
>> 
>> Isn't that nice :)
>> 
>> James.
>> 
>> 
>> As a side-note I am using gnu g++ as a compiler. With -g3 for debug info.
>> 
>> .
>> 
> _______________________________________________
> mdb-discuss mailing list
> mdb-discuss at opensolaris.org
>

==========================================================================
No good can come from selling your freedom, not for all gold of the world,
for the value of this heavenly gift exceeds that of any fortune on earth.
==========================================================================

[mdb-discuss] trace memory corruption

Reply via email to