[mdb-discuss] trace memory corruption

Berg, Ivan Michael (Ivan) Wed, 28 Jun 2006 08:26:53 -0600

You may also want to check out libumem's undocumented firewalling
feature -
http://blogs.sun.com/roller/page/peteh?entry=hidden_features_of_libumem_
firewalls.

This will allow the process to coredump right away when the corruption
is triggered, at the expense of a lot of extra pages/memory being
consumed. As far as I can tell, this is equivalent to MSFT light
pageheap vs full pageheap.

If in a dev env /w enough memory, this can make your detective work a
whole lot easier, depending on the corruption.

Ivan 

-----Original Message-----
From: mdb-discuss-boun...@opensolaris.org
[mailto:mdb-discuss-bounces at opensolaris.org] On Behalf Of James Coleman
Sent: Wednesday, June 28, 2006 4:15 AM
To: Xie,Zhong; mdb-discuss at opensolaris.org
Subject: Re: [mdb-discuss] trace memory corruption

Xie,Zhong wrote:
> I wondering is it possible to track down memory corruption like below 
> using umem
> 
> char * p1= new char[8];
> ...
> delete p1;
> char * p2 =new char[8];
> ....
> strcpy(p2, "56789");
> strcpy(p1, "01234");  //Bug  causes memory corruption
> 
> p1 and p2 point to the "same size" memory. It's possible p1, p2
acctually point to the same address. In this case , can umem still be
able to tracing down the memory corruption. It doesn't work for me. 
> gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
>  

Hello,

With my limited experience with mdb and libumem they are able to detect
memory corruption.
I was curious about how this was handled so I poked at it a bit myself
and here is what I have learned. I hope it is a little bit useful to
you.

Two things:

1. Your example above might not corrupt memory (if it is simplified).
2. The memory corruption will not be detected immediately.

1.

If I take the simplest possible case of your example.
And add a printf to check one thing.

#include <strings.h>
#include <stdio.h>

int main()
{
  char * p1= new char[8];
  delete p1;
  char * p2 =new char[8];

  if (p2 == p1) printf("Oops! p1 == p2\n");

  strcpy(p2, "56789");
  strcpy(p1, "01234");  //Bug causes memory corruption (if p1 points to
invalid area!) }

We see that p2 is p1 so by accident in this case there is no corruption!
:) oops!

Moving the delete we do get corruption (and we do not see the Oops):

int main()
{
  char * p1= new char[8];
  // no corruption if delete here, delete p1;
  char * p2 =new char[8];
  delete p1;

  if (p2 == p1) printf("Oops! p1 == p2\n");

  strcpy(p2, "56789");
  strcpy(p1, "01234");  //Bug  causes memory corruption

  return 0;
}

Now running this looks normal. No coredump.
Running with libumem:
UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1
~/c/testcorrupt nothing.
Hmmmmm. (see 2. below for why this is expected)

Now put in a sleep before the return to allow us time to attach to the
process using mdb:
#include <unistd.h>
  sleep(70);
Now using `mdb -p pid` and

 > ::umem_verify

umem_alloc_16                      3d6c8 1 corrupt buffer

 > 3d6c8::umem_verify
Summary for cache 'umem_alloc_16'
   buffer 49fe0 (free) seems corrupted, at 0

 > 49fe0/10X
0x49fe0:        deadbeef        deadbeef        30313233        3400beef
feedface 
feedface        54780           f4ebb36e
                 deadbeef        deadbeef

By examining what is in the corrupt buffer you might be able to tell
where it came from.

 > ::umalog

T-0.000000000  addr=49fe0  umem_alloc_16
          libumem.so.1`umem_cache_free+0x4c
          libumem.so.1`process_free+0x68
          libumem.so.1`free+0x38
          libstdc++.so.6.0.3`_ZdlPv+0x10
          main+0x28
          _start+0x5c

T-0.000031250  addr=49fc0  umem_alloc_16
          libumem.so.1`umem_cache_alloc+0x13c
          libumem.so.1`umem_alloc+0x44
          libumem.so.1`malloc+0x2c
          libstdc++.so.6.0.3`_Znwj+0x1c
          libstdc++.so.6.0.3`_Znaj+4
          main+0x18
          _start+0x5c

T-0.000053750  addr=49fe0  umem_alloc_16
          libumem.so.1`umem_cache_alloc+0x13c
          libumem.so.1`umem_alloc+0x44
          libumem.so.1`malloc+0x2c
          libstdc++.so.6.0.3`_Znwj+0x1c
          libstdc++.so.6.0.3`_Znaj+4
          main+8
          _start+0x5c
 >

By grepping the umalog you can find where the buffer that was corrupted
was malloc or freed from.

2.

Memory corruption is detected when a buffer with corrupted redzones is
freed.
You can also attach to the process and run ::umem_verify and friends.
When freed memory is used by another malloc maybe the corruption is
detected?
Not sure. Didn't

Of course you cannot validate all memory and look for corruption after
every malloc/free.
This would make things very slow.

This article:
http://access1.sun.com/techarticles/libumem.html
Describes how to use gcore to make the process (under libumem) dump core
and then run ::umem_verify and friends. Or attach to process while still
running but after the put in a big sleep at the end

A final note.

If I had other "stuff" instead of the sleep after the corruption.
A new (which would use the corrupt buffer).

"stuff":
  char * p3= new char[8];
  if (p3 == p1) printf("Yes. p1 == p3\n");

  for(int i=0;i<100;i++);

  return 0;

oops silly me! I left whatever I was going to do with the for loop
undone.
but irregardless of that. umem dumped core (it looked like it was on
that new):

[jcoleman at slaine] ~/c/$ UMEM_DEBUG=default UMEM_LOGGING=transaction
LD_PRELOAD=libumem.so.1 ~/c/testcorrupt Abort (core dumped)

[jcoleman at slaine] ~/c/$ mdb core
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]  > $c
libc.so.1`_kill+8(1, 64, 65640000, 7efefeff, 81010100, ff00)
libumem.so.1`umem_err_recoverable+0x74(ff360cac, fffffff7, ffffffff,
3b288b8f, 4bfb0, 3d720) libumem.so.1`umem_error+0x49c(1, ff377010, 0,
49fe0, 3d780, 10) libumem.so.1`umem_cache_alloc_debug+0xf0(3d6c8, 49fe0,
0, ff356d8c, 0, 0) libumem.so.1`umem_cache_alloc+0x208(49fe0, 0, 0, 0,
0, 0) libumem.so.1`umem_alloc+0x44(10, 0, 0, 0, 0, 0)
libumem.so.1`malloc+0x2c(8, ffbff0d8, 0, 0, ffbff188, ff1bc000)
libstdc++.so.6.0.3`_Znwj+0x1c(8, ffbff188, 0, 0, 0, ff19ff1c) 
libstdc++.so.6.0.3`_Znaj+4(8, 10950, 49fe8, 34000000, 3400, 49fc8)
main+0x8c(1, ffbff2ac, ffbff2b4, 20bb0, 0, 0)
_start+0x5c(0, 0, 0, 0, 0, 0)

 > ::umem_status
Status:         ready and active
Concurrency:    1
Logs:           transaction=64k (inactive)
Message buffer:
umem allocator: buffer modified after being freed modification occurred
at offset 0x8 (0xdeadbeefdeadbeef replaced by 0x303132333400beef)
buffer=49fe0  bufctl=54780  cache: umem_alloc_16 previous transaction on
buffer 49fe0:
thread=1  time=T-6.992512911  slab=4bfb0  cache: umem_alloc_16
libumem.so.1'umem_cache_free+0x4c libumem.so.1'?? (0xff353868)
libumem.so.1'free+0x38
libstdc++.so.6.0.3'_ZdlPv+0x10
testcorrupt'main+0x28
testcorrupt'_start+0x5c
umem: heap corruption detected
stack trace:
libumem.so.1'?? (0xff3554c8)
libumem.so.1'?? (0xff356508)
libumem.so.1'umem_cache_alloc+0x208
libumem.so.1'umem_alloc+0x44
libumem.so.1'malloc+0x2c
libstdc++.so.6.0.3'_Znwj+0x1c
libstdc++.so.6.0.3'_Znaj+0x4
testcorrupt'main+0x8c
testcorrupt'_start+0x5c

Hooray.

Isn't that nice :)

James.

As a side-note I am using gnu g++ as a compiler. With -g3 for debug
info.

_______________________________________________
mdb-discuss mailing list
mdb-discuss at opensolaris.org

[mdb-discuss] trace memory corruption

Reply via email to