On Thu, 30 Mar 2006, Claus Assmann wrote:
Is there some "simple" way to find a memory leak in some OS supplied library? I have a (constantly running) application that grows in a week from 5MB to 15MB in size (VSZ and RSS as reported by ps). The application can be compiled with an optional debugging memory allocator that tracks all (de)allocations to check whether any of its malloc()/free() calls leak memory; according to that tool the application behaves fine. Hence I'm wondering whether there is a memory leak in some library or the OS, which also could be triggered by the way my application uses it (see the recent thread about telldir()/seekdir()).
The approach that I used with smbd to find the telldir()/seekdir() leaks wasn't "simple", and it did involve some patience and trial and error, but I'm not sure it was particularly complicated either. Basically, I recompiled smbd to link with the dmalloc library which is designed to augment the system malloc()-type calls. The samba devel team thoughtfully includes developer code to support this -- I simply needed to recompile the app and turn it on. Then I ran the program with the test case that caused it to leak memory and watched the dmalloc output as smbd exitted. This confirmed the leak; dmalloc catches leaks even if the underlying leaking malloc() is buried in a system call -- or at least it did in this case. Unfortunately, dmalloc only reported a stack return address to identify the culprit, so I ended up having to trace through the code and narrow the issue using dmalloc mark and reporting calls. If this is your own code, then this should be significantly easier than tracing samba code -- but I had to go through this step to understand what part of samba was leaking memory. At this point, my assumption was that it was something odd that samba was doing for my particular install. Eventually, once I'd narrowed the leaking code in samba, I ended up attaching to the process using gdb and determing where the return address on the stack was pointing. In my case, that was in the middle of telldir(). If you choose to try dmalloc (dmalloc.com), there are some very nice tutorials on their website for using a debugger to help track memory issues. I also used the internal libc malloc() debug options to help confirm the memory leak, though I wasn't as successful at identifying the leak with it. It did provide another avenue to confirm that the app was leaking. There is test code floating around on the telldir() threads in tech@ that might give you a template for using it, though this may require a recompile of libc to turn on the MALLOC_STATS option. It may be simpler to man malloc to see the easiest method for enabling the memory debugging code buried in libc. Maybe someone else on the list can give some insight into the "dump" results of the malloc() stats to see if there is a way to determine the caller, maybe in conjunction with gbd? So, that's the approach that worked for me. There may be much simpler approaches and/or tools depending on the code you are working on. I'm far from an expert at this ... Good hunting. - Paul
My application uses pthreads and the DNS resolver, the latter by contacting it via UDP: sendto(), recvfrom(). Note: the memory leak seems to be unique to OpenBSD (3.8 and earlier), I can't reproduce it on SunOS 5.9 and others. That's why I'm asking for hints where to look for the leak: is there some "simple" way to show the allocated memory in the debugger or via system calls and to find out which functions made those allocations?