hints on debugging memory corruption...
Hello All (Sorry for such a basic question; very probably there are some GDB tricks that I ignore). In my MELT branch I have now some corrputed memory (maybe because I am inserting a pass at the wrong place in the pass tree). At some point, I call bb_debug, and it crashes because the field bb_next contains 0x101 (which is not a valid adress). So I need to understand who is writing the 0x101 in that field. How do you folks debug such issues. An obvious strategy is to use the hardware watchpoint feature of GDB. However, one cannot nicely put a watchpoint on an address which is not mmap-ed yet. But I don't know how to ask gdb to be notified when a given adress is becoming valid in the address space. Putting a breakpoint on mmap is really not funny. Any hints are welcome! Cheers -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: hints on debugging memory corruption...
On 02/04/2011 07:42 AM, Basile Starynkevitch wrote: > An obvious strategy is to use the hardware watchpoint feature of GDB. > However, one cannot nicely put a watchpoint on an address which is not > mmap-ed yet. I typically find the location at which the object containing the address is allocated. E.g. in alloc_block on the return statement. Make this bp conditional on the object you're looking for. r~
Re: hints on debugging memory corruption...
Basile Starynkevitch writes: > In my MELT branch I have now some corrputed memory (maybe because I am > inserting a pass at the wrong place in the pass tree). At some point, I call > bb_debug, and it crashes because the field bb_next contains 0x101 (which is > not a valid adress). > > So I need to understand who is writing the 0x101 in that field. > > How do you folks debug such issues. > > An obvious strategy is to use the hardware watchpoint feature of GDB. > However, one cannot nicely put a watchpoint on an address which is not > mmap-ed yet. > > But I don't know how to ask gdb to be notified when a given adress is > becoming valid in the address space. Putting a breakpoint on mmap is really > not funny. I usually put a breakpoint on the allocation routine and make it conditional on returning the address I am interested in. Once that address is returned, I can put a hardware breakpoint on the field value being changed. This approach is only moderately successful in practice. Ian
Re: hints on debugging memory corruption...
Quoting Basile Starynkevitch : Hello All (Sorry for such a basic question; very probably there are some GDB tricks that I ignore). In my MELT branch I have now some corrputed memory (maybe because I am inserting a pass at the wrong place in the pass tree). At some point, I call bb_debug, and it crashes because the field bb_next contains 0x101 (which is not a valid adress). So I need to understand who is writing the 0x101 in that field. How do you folks debug such issues. An obvious strategy is to use the hardware watchpoint feature of GDB. However, one cannot nicely put a watchpoint on an address which is not mmap-ed yet. One way is to do bit of stepping or inserting breakpoints in interesting places and see when the address gets mapped. If this is to tedious, and the memory corruption is in the heap, and will manifest itself also after a bit of perturbation, you can use a breakpoint at main (or if the corruption happens earlier, in a constructor or init) and call free and the result of malloc for a large memory area. For some OSes you might also need to malloc several medium-sized blocks and free them to actually get stuff mapped. run to see where the corruption strikes now, then do a fresh start with the same allocation pattern and set your watchpoint. For other allocators, you can hack the code a bit to make the first allocated chunk huge enough to contain the corrupted area.
Re: hints on debugging memory corruption...
2011/2/4 Basile Starynkevitch : > An obvious strategy is to use the hardware watchpoint feature of GDB. > However, one cannot nicely put a watchpoint on an address which is not > mmap-ed yet. Actually, you can do this with a recent enough GDB (7.1 AFAIK). It will keep watchpoint disabled until the page is mapped in, then enables it automatically (perhaps might have false hits if the page is cleared etc.) However I still like the method described by Ian and rth better. -- Laurynas
[google] Merged google/integration into google/main
This merge brings all the integration patches. I've also created a ChangeLog.google-main to separate the changes from both branches. Tested on x86_64. Diego.
Re: hints on debugging memory corruption...
> "Basile" == Basile Starynkevitch writes: Basile> So I need to understand who is writing the 0x101 in that field. valgrind can sometimes catch this, assuming that the write is an invalid one. Basile> An obvious strategy is to use the hardware watchpoint feature of GDB. Basile> However, one cannot nicely put a watchpoint on an address which is not Basile> mmap-ed yet. I think a new-enough gdb should handle this ok. rth> I typically find the location at which the object containing the address rth> is allocated. E.g. in alloc_block on the return statement. Make this rth> bp conditional on the object you're looking for. I do this, too. One thing to watch out for is that the memory can be recycled. I've been very confused whenever I've forgotten this. I have a hack for the GC (appended -- ancient enough that it probably won't apply) that makes it easy to notice when an object you are interested in is collected. IIRC I apply this before the first run, call ggc_watch_object for the thing I am interested in, and then see in what GC cycle the real one is allocated. Tom Index: ggc-page.c === --- ggc-page.c (revision 127650) +++ ggc-page.c (working copy) @@ -430,6 +430,13 @@ } *free_object_list; #endif + /* Watched objects. */ + struct watched_object + { +void *object; +struct watched_object *next; + } *watched_object_list; + #ifdef GATHER_STATISTICS struct { @@ -481,7 +488,7 @@ /* Initial guess as to how many page table entries we might need. */ #define INITIAL_PTE_COUNT 128 -static int ggc_allocated_p (const void *); +int ggc_allocated_p (const void *); static page_entry *lookup_page_table_entry (const void *); static void set_page_table_entry (void *, page_entry *); #ifdef USING_MMAP @@ -549,7 +556,7 @@ /* Returns nonzero if P was allocated in GC'able memory. */ -static inline int +int ggc_allocated_p (const void *p) { page_entry ***base; @@ -1264,9 +1271,36 @@ (unsigned long) size, (unsigned long) object_size, result, (void *) entry); + { +struct watched_object *w; +for (w = G.watched_object_list; w; w = w->next) + { + if (result == w->object) + { + fprintf (stderr, "re-returning watched object %p\n", w->object); + break; + } + } + } + return result; } +int +ggc_check_watch (void *p, char *what) +{ + struct watched_object *w; + for (w = G.watched_object_list; w; w = w->next) +{ + if (p == w->object) + { + fprintf (stderr, "got it: %s\n", what); + return 1; + } +} + return 0; +} + /* If P is not marked, marks it and return false. Otherwise return true. P must have been allocated by the GC allocator; it mustn't point to static objects, stack variables, or memory allocated with malloc. */ @@ -1293,6 +1327,19 @@ if (entry->in_use_p[word] & mask) return 1; + { +struct watched_object *w; +for (w = G.watched_object_list; w; w = w->next) + { + if (p == w->object) + { + fprintf (stderr, "marking object %p; was %d\n", p, +(int) (entry->in_use_p[word] & mask)); + break; + } + } + } + /* Otherwise set it, and decrement the free object count. */ entry->in_use_p[word] |= mask; entry->num_free_objects -= 1; @@ -1337,6 +1384,15 @@ return OBJECT_SIZE (pe->order); } +void +ggc_watch_object (void *p) +{ + struct watched_object *w = XNEW (struct watched_object); + w->object = p; + w->next = G.watched_object_list; + G.watched_object_list = w; +} + /* Release the memory for object P. */ void @@ -1345,11 +1401,21 @@ page_entry *pe = lookup_page_table_entry (p); size_t order = pe->order; size_t size = OBJECT_SIZE (order); + struct watched_object *w; #ifdef GATHER_STATISTICS ggc_free_overhead (p); #endif + for (w = G.watched_object_list; w; w = w->next) +{ + if (w->object == p) + { + fprintf (stderr, "freeing watched object %p\n", p); + break; + } +} + if (GGC_DEBUG_LEVEL >= 3) fprintf (G.debug_file, "Freeing object, actual size=%lu, at %p on %p\n", @@ -1868,6 +1934,10 @@ #define validate_free_objects() #endif +int ggc_nc = 0; + + + /* Top level mark-and-sweep routine. */ void @@ -1903,6 +1973,21 @@ clear_marks (); ggc_mark_roots (); + + if (G.watched_object_list) +{ + struct watched_object *w; + fprintf (stderr, "== starting collection %d\n", ggc_nc); + ++ggc_nc; + for (w = G.watched_object_list; w; w = w->next) + { + if (!ggc_marked_p (w->object)) + { + fprintf (stderr, "object %p is free\n", w->object); + } + } +} + #ifdef GATHER_STATISTICS ggc_prune_overhead_list (); #endif
Re: hints on debugging memory corruption...
Quoting Tom Tromey : "Basile" == Basile Starynkevitch writes: Basile> So I need to understand who is writing the 0x101 in that field. One thing to watch out for is that the memory can be recycled. I've been very confused whenever I've forgotten this. I have a hack for the GC (appended -- ancient enough that it probably won't apply) that makes it easy to notice when an object you are interested in is collected. IIRC I apply this before the first run, call ggc_watch_object for the thing I am interested in, and then see in what GC cycle the real one is allocated. If what you are looking for survives such a change, postponing garbage collection so it won't happen till the crash can make things simpler.
[google] Separating ChangeLogs from different branches
While doing the integration->main merge today, I realized that I misnamed the ChangeLog.google files. If both branches name them the same, merges will be ugly and we won't be able to tell them apart easily. I renamed all the existing ChangeLog.google to ChangeLog.google-integration. We'll have a ChangeLog.google-main in the other branch. Diego.