> On 11/5/19 3:48 PM, Jan Hubicka wrote: > > > > > > > > stringpool.c:63 (alloc_node) 47M: 2.3% > > > > 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > > > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% > > > > 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > > > > hash-table.h:801 (expand) 81M: 3.9% > > > > 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > > > > ^^^ some of memory comes here which ought to be accounted to caller > > > > of > > > > expand. > > > > > > Yes, these all come from ggc_internal_alloc. Ideally we should register a > > > mem_alloc_description > > > for each created symbol/call_summary and register manually every > > > allocation to such descriptor. > > > > Or just pass memory stats from caller of expand and transitively pass it > > from caller of summary. This will get us the line info of get_create > > call that is IMO OK. > > The issue with this approach is that you will spread a summary allocation > along all the ::get_create places. Which is not ideal.
We get it with other allocations, too. Not ideal, but better. Even better solutions are welcome :) > > Try to take a look, or we can debug that on Thursday together? > Martin Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks all active allocations objects and looks if they was collected accoutnig their collection and then throws away all allocations (including those not colelcted) and those gets no longer accounted later. So we basically misaccount everything that survives ggc_collect. No wonder that it makes me to hunt ghosts 8-O Also the last memory report was sorted by garbage not leak for reason - for normal compilation we care about garbage produces primarily because those triggers ggc collects and makes compiler slow. BTW I like how advanced C++ gets back to lisp :) With the fix I get following stats by end of firefox WPA cfg.c:127 (alloc_block) 32M: 1.2% 12M: 2.6% 0 : 0.0% 0 : 0.0% 446k symtab.c:582 (create_reference) 42M: 1.6% 0 : 0.0% 65M: 1.7% 1329k: 0.4% 840k gimple-streamer-in.c:101 (input_gimple_stmt) 49M: 1.9% 17M: 3.5% 0 : 0.0% 375k: 0.1% 747k tree-ssanames.c:308 (make_ssa_name_fn) 51M: 2.0% 16M: 3.4% 0 : 0.0% 0 : 0.0% 973k ipa-cp.c:5157 (ipcp_store_vr_results) 51M: 2.0% 1243k: 0.2% 0 : 0.0% 9561k: 3.0% 146k stringpool.c:63 (alloc_node) 53M: 2.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1362k ipa-prop.c:3988 (duplicate) 63M: 2.4% 1115k: 0.2% 0 : 0.0% 10M: 3.2% 264k toplev.c:904 (realloc_for_line_map) 72M: 2.8% 0 : 0.0% 71M: 1.9% 15M: 5.1% 27 tree-ssanames.c:83 (init_ssanames) 96M: 3.7% 121M: 24.4% 44M: 1.2% 87M: 27.8% 174k tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 4.0% 0 : 0.0% 39M: 1.0% 0 : 0.0% 105k stringpool.c:41 (stringpool_ggc_alloc) 106M: 4.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k lto/lto-common.c:204 (lto_read_in_decl_state) 160M: 6.2% 0 : 0.0% 105M: 2.8% 19M: 6.1% 1731k cgraph.c:851 (create_edge) 248M: 9.5% 0 : 0.0% 70M: 1.9% 0 : 0.0% 3141k cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7% 0 : 0.0% 155M: 4.1% 0 : 0.0% 1567k tree-streamer-in.c:631 (streamer_alloc_tree) 718M: 27.5% 136M: 27.5% 1267M: 33.3% 64M: 20.6% 15M -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 2609M:100.0% 497M:100.0% 3804M:100.0% 313M:100.0% 49M -------------------------------------------------------------------------------------------------------------------------------------------- This looks more realistic. ssa_operands and init_ssanames shows that we read really a lot of bodies into memory. I also wonder if we realy want to compute virutal ssa form for them when we only want to compare them. After reading and symbol table merging I get: cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k So it seems that about half of callgrpah nodes are inline clones, so working on reducing clone overhead (in addition to re-visiting tree merging once again) seems to be most meaningful right now. OK if patch passes testing? * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving memory allocations away. * mem-stats.h (mem_alloc_description<T>::release_object_overhead): do not silently ignore invalid release requests. Index: ggc-common.c =================================================================== --- ggc-common.c (revision 277796) +++ ggc-common.c (working copy) @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void) for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it) if (!ggc_marked_p ((*it).first)) - (*it).second.first->m_collected += (*it).second.second; - - delete ggc_mem_desc.m_reverse_object_map; - ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false); + { + (*it).second.first->m_collected += (*it).second.second; + ggc_mem_desc.m_reverse_object_map->remove ((*it).first); + } } /* Return memory used by heap in kb, 0 if this info is not available. */ Index: mem-stats.h =================================================================== --- mem-stats.h (revision 277796) +++ mem-stats.h (working copy) @@ -535,11 +535,8 @@ inline void mem_alloc_description<T>::release_object_overhead (void *ptr) { std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr); - if (entry) - { - entry->first->release_overhead (entry->second); - m_reverse_object_map->remove (ptr); - } + entry->first->release_overhead (entry->second); + m_reverse_object_map->remove (ptr); } /* Unregister a memory allocation descriptor registered with