On Fri, 23 Oct 2020, Jan Hubicka wrote: > > Hi, > > > > On Thu, Oct 22 2020, Jan Hubicka wrote: > > > Hi, > > > this patch removes the pass to materialize all clones and instead this > > > is now done on demand. The motivation is to reduce lifetime of function > > > bodies in ltrans that should noticeably reduce memory use for highly > > > parallel compilations of large programs (like Martin does) or with > > > partitioning reduced/disabled. For cc1 with one partition the memory use > > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not > > > particularly accurate). > > > > > > > Nice. > > Sadly this is only true w/o debug info. I collected memory usage stats > at the end of the ltrans stage and it is as folloes > > - after streaming in global stream: 126M GGC and 41M heap > - after streaming symbol table: 373M GGC and 92M heap > - after stremaing in summaries: 394M GGC and 92M heap > (only large summary seems to be ipa-cp transformation summary) > - then compilation starts and memory goes slowly up to 3527M at the end > of compilation > > The following accounts for more than 1% GGC: > > Time variable usr sys > wall GGC > ipa inlining heuristics : 6.99 ( 0%) 4.62 ( 1%) 11.17 ( > 1%) 241M ( 1%) > ipa lto gimple in : 50.04 ( 3%) 29.72 ( 7%) 80.22 ( > 4%) 3129M ( 14%) > ipa lto decl in : 0.79 ( 0%) 0.36 ( 0%) 1.15 ( > 0%) 135M ( 1%) > ipa lto cgraph I/O : 0.95 ( 0%) 0.20 ( 0%) 1.15 ( > 0%) 269M ( 1%) > cfg cleanup : 25.83 ( 2%) 2.52 ( 1%) 28.15 ( > 1%) 154M ( 1%) > df reg dead/unused notes : 24.08 ( 2%) 2.09 ( 1%) 26.77 ( > 1%) 180M ( 1%) > alias analysis : 16.94 ( 1%) 1.05 ( 0%) 17.71 ( > 1%) 383M ( 2%) > integration : 45.76 ( 3%) 44.30 ( 11%) 88.99 ( > 5%) 2328M ( 10%) > tree VRP : 41.38 ( 3%) 15.67 ( 4%) 57.71 ( > 3%) 560M ( 2%) > tree SSA rewrite : 6.71 ( 0%) 2.17 ( 1%) 8.96 ( > 0%) 194M ( 1%) > tree SSA incremental : 26.99 ( 2%) 8.23 ( 2%) 34.42 ( > 2%) 144M ( 1%) > tree operand scan : 65.34 ( 4%) 61.50 ( 15%) 127.02 ( > 7%) 886M ( 4%) > dominator optimization : 41.53 ( 3%) 13.56 ( 3%) 55.78 ( > 3%) 407M ( 2%) > tree split crit edges : 1.08 ( 0%) 0.65 ( 0%) 1.63 ( > 0%) 127M ( 1%) > tree PRE : 34.30 ( 2%) 14.52 ( 4%) 49.08 ( > 3%) 337M ( 1%) > tree code sinking : 2.92 ( 0%) 0.58 ( 0%) 3.51 ( > 0%) 122M ( 1%) > tree iv optimization : 6.71 ( 0%) 1.19 ( 0%) 8.46 ( > 0%) 133M ( 1%) > expand : 45.56 ( 3%) 8.24 ( 2%) 55.02 ( > 3%) 1980M ( 9%) > forward prop : 11.89 ( 1%) 1.39 ( 0%) 12.59 ( > 1%) 130M ( 1%) > dead store elim2 : 10.03 ( 1%) 0.70 ( 0%) 11.23 ( > 1%) 138M ( 1%) > loop init : 11.96 ( 1%) 4.95 ( 1%) 17.11 ( > 1%) 378M ( 2%) > CPROP : 22.63 ( 2%) 2.78 ( 1%) 25.19 ( > 1%) 359M ( 2%) > combiner : 41.39 ( 3%) 2.57 ( 1%) 43.30 ( > 2%) 558M ( 2%) > reload CSE regs : 22.38 ( 2%) 1.25 ( 0%) 23.06 ( > 1%) 186M ( 1%) > final : 32.33 ( 2%) 4.28 ( 1%) 36.75 ( > 2%) 1105M ( 5%) > symout : 49.04 ( 3%) 2.23 ( 1%) 52.33 ( > 3%) 2517M ( 11%) > var-tracking emit : 33.26 ( 2%) 1.02 ( 0%) 34.35 ( > 2%) 582M ( 3%) > rest of compilation : 38.05 ( 3%) 15.61 ( 4%) 52.42 ( > 3%) 114M ( 1%) > TOTAL :1486.02 408.79 1899.96 > 22512M > > We seem to leak some hashtables: > dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M > 19 : 0.0% ggc
that one likely keeps quite some memory live... > cselib.c:3137 (cselib_init) 34M: 25.9% 34M > 1514k: 17.3% heap > tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M > 228k: 2.6% ggc Hmm, so we do scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100); and scalar_evolution_info->empty (); scalar_evolution_info = NULL; to reclaim. ->empty () will IIRC at least allocate 7 elements which we the eventually should reclaim during a GC walk - I guess the hashtable statistics do not really handle GC reclaimed portions? If there's a friendlier way of releasing a GC allocated hash-tab we can switch to that. Note that in principle the hash-table doesn't need to be GC allocated but it needs to be walked since it refers to trees that might not be referenced in other ways. > and hashmaps: > ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k > 9 : 0.0% heap > tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k > 8190 : 0.1% heap Similar as SCEV, probably mis-accounting? > alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k > 4546 : 0.0% ggc > ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M > 16 : 0.0% ggc > dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M > 12 : 0.0% ggc > > and hashsets: > ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k > 8 : 0.0% heap > ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k > 4065 : 0.0% heap > tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k > 119k: 0.8% heap > > and vectors: > tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% > 976k 475621: 0.8% 17k 24k Huh. It's an auto_vec<> > tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% > 1187k 198336: 0.3% 23k 34k > tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% > 1264k 380385: 0.6% 24k 36k > ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% > 1848k 43: 0.0% 77k 81k > tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% > 1328k 892964: 1.4% 27k 33k > graphds.c:254 (graphds_dfs) 4 1469k: 0.6% > 1675k 2101780: 3.4% 30k 34k > dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% > 2266k 685140: 1.1% 46k 50k > tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% > 2341k 330758: 0.5% 47k 63k > tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% > 2606k 405451: 0.7% 49k 83k > symtab.c:612 (create_reference) 8 3314k: 1.3% > 4897k 75213: 0.1% 414k 612k > vec.h:1734 (copy) 48 233M:90.5% > 234M 6243163:10.1% 4982k 5003k Those all look OK to me, not sure why we even think there's a leak? > However main problem is > cfg.c:202 (connect_src) 5745k: 0.2% 271M: > 1.9% 1754k: 0.0% 1132k: 0.2% 7026k > cfg.c:212 (connect_dest) 6307k: 0.2% 281M: > 2.0% 10129k: 0.2% 2490k: 0.5% 7172k > varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : > 0.0% 0 : 0.0% 0 : 0.0% 51k > emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: > 1.5% 96 : 0.0% 0 : 0.0% 9502k > dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : > 0.0% 4906k: 0.1% 1405k: 0.3% 78k > emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: > 0.7% 0 : 0.0% 0 : 0.0% 1442k > tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: > 3.3% 0 : 0.0% 0 : 0.0% 6622k > gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: > 3.7% 8609k: 0.2% 2972k: 0.6% 7135k > tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: > 0.2% 0 : 0.0% 0 : 0.0% 328k > dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : > 0.0% 0 : 0.0% 0 : 0.0% 444k > cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: > 0.4% 355M: 6.8% 0 : 0.0% 9083k > tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: > 0.0% 0 : 0.0% 0 : 0.0% 548k > stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: > 0.0% 0 : 0.0% 2270k: 0.5% 588k > stringpool.c:63 (alloc_node) 10M: 0.3% 12M: > 0.1% 0 : 0.0% 0 : 0.0% 588k > tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: > 1.1% 0 : 0.0% 3539k: 0.7% 340k > cgraph.c:289 (create_empty) 12M: 0.3% 0 : > 0.0% 109M: 2.1% 0 : 0.0% 371k > cfg.c:127 (alloc_block) 14M: 0.4% 705M: > 5.0% 0 : 0.0% 0 : 0.0% 7086k > tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: > 0.0% 0 : 0.0% 22k: 0.0% 64k > tree-inline.c:834 (remap_block) 28M: 0.8% 159M: > 1.1% 0 : 0.0% 0 : 0.0% 2009k > stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: > 0.0% 0 : 0.0% 6658k: 1.4% 1785k > dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : > 0.0% 32M: 0.6% 144 : 0.0% 20 > tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: > 0.4% 0 : 0.0% 0 : 0.0% 646k > tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: > 1.9% 0 : 0.0% 0 : 0.0% 2497k > function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: > 0.7% 0 : 0.0% 0 : 0.0% 2109k > hash-table.h:802 (expand) 142M: 3.9% 18M: > 0.1% 198M: 3.8% 32M: 6.9% 38k > dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: > 0.1% 0 : 0.0% 0 : 0.0% 2955k > tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: > 3.0% 0 : 0.0% 4201k: 0.9% 9828k > dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : > 0.0% 0 : 0.0% 0 : 0.0% 5556k > dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: > 0.1% 2880 : 0.0% 0 : 0.0% 6812k > dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : > 0.0% 94M: 1.8% 4584k: 1.0% 3877k > toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : > 0.0% 767M: 14.6% 255M: 54.4% 33 > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage > Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 3689M:100.0% > 14039M:100.0% 5254M:100.0% 470M:100.0% 391M > -------------------------------------------------------------------------------------------------------------------------------------------- > > Clearly some function bodies leak - I will try to figure out what. But > main problem is debug info. > I guess debug info for whole cc1plus is large, but it would be nice if > it was not in the garbage collector, for example :) Well, we're building a DIE tree for the whole unit here so I'm not sure what parts we can optimize. The structures may keep quite some stuff on the tree side live through the decl -> DIE and block -> DIE maps and the external_die_map used for LTO streaming (but if we lazily stream bodies we do need to keep this map ... unless we add some start/end-stream-body hooks and doing the map per function. But then we build the DIEs lazily as well so the query of the map is lazy :/) Richard. -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend