On Fri, 28 Sep 2018, Jan Hubicka wrote: > Hi, > this is a proof-of-concept patch for type merging during LTO streaming. It > does two things > 1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list > This is useful at compilation time because frontends produce more variants > than needed. The effect of this is about 2% of decl stream > 2) replace ODR types by their prevailing variant if ODR violation was not > detected. This is useful at WPA time and saves about 50% of global decl > stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB. > > Here are stats of number of nodes hitting global stream during WPA->ltrans > streaming of firefox: > > Before after > 7354 7280 namespace_decl > 46593 14084 union_type > 18496 15405 translation_unit_decl > 22816 21548 integer_type > 107047 31681 enumeral_type > 67072 46390 array_type > 541220 99572 pointer_plus_expr > 542360 100712 addr_expr > 167657 154019 var_decl > 864769 182691 tree_binfo > 240200 206783 reference_type > 410403 316877 function_type > 1862985 522524 type_decl > 1954864 652664 record_type > 1070333 880209 method_type > 1582055 1014270 pointer_type > 1495367 1406670 tree_list > 7926385 1483545 field_decl > 1715133 1725612 function_decl > 3384810 3191406 identifier_node > > So largest savings are in field_decls (to 18% and they are not even merged > fully by the patch), pointer tpes (to 64%), record types (to 30%) and type > decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to > reffer to vtables to 21%). Patch bootstraps, regtestes and seems to > work reliably. > > The merging is currently done during streaming by simply looking up prevailing > type and populating the cache (so we do not get too many repeated lookups). > Similar tricks can be added to checksum calculation. > > I am not sure this is best possible place, especially for 2) since it provides > really quite major reduction in the IL size. I am quite sure we do not want > to do that at stream in time in parallel to SCC merging because it affects the > SCC components and also we do not want to merge this way types containing ODR > violations as QOI issue. (merging two completely different types would lead > to ICEs but merging two TBAA incompatible types is probably equally bad). > > So we need to do following > a) read in the the types & do tree mering > b) populate odr type hash, do ODR violation checks and decide on what types > are consistent > c) do the ODR based merging sometime before the trees hit ltrans stream. > > I was thinking that perhaps c) can also be added to mentions_vars_p machinery > which would make it somewhat more costy but we could free those duplicates and > save some more memory during WPA. I am also not usre if longer term we > would not want to have mode that bypasses WPA->ltrans stream completely and > just compile out of WPA process in threads or forks. > Drawback is that building vector recording every single type pointer is going > to also consume quite some memory. > > Patch is also not complete because it does not merge field decls. Analogous > tricks can be added to streamer, but I just tought I would like to discuss > alternative implementations first. > > WPA compile time improves by about 11%, so merging benefits overcome the extra > complexity in sreamer. > > I hope that with this patch I will be able to increase default number of > partitions > since the global stream is getting less percentage of the ltrans files. 33% > is still quite a lot, but far better than what we had previously. Hopefully > we could still dramatically cut this down in followup. > > Also obvious drawback is that this helps to C++ only. I looked into stats from > C programs and also Toon's stats on Fortran and I tend to believe that C++ > is really much worse than those two.
The ODR savings really look good but as you say the implementation is somewhat "tricky". I'd like to see the type-variant done separately (of course) and also differently. This is because when enabling free-lang-data by default we should be able to get benefits for non-LTO compilations as well. Specifically I'd like us to (in free-lang-data) "canonicalize" existing variants applying the rules you derived for the lazier compare. At the point you then drop non-fld-walked variants we could weed out the resulting duplicates, keeping only a prevailing one in the list and (ab-)using GC to fixup references. Like with ggc_register_fixup (void *from, void *to) which would when following edges at mark time, adjust them according to this map (I'm not sure if we'd want to do it manually and record a vector of possibly affected TREE_TYPE uses during the fld walk). We could also (easily?) optimize the streaming itself by streaming only differences to the main variant of a type (but then we could change our tree data structure to make that trivial). I wonder how much "ODR" merging we'd get for free when we canonicalize types some more - that is, how do ODR equal but tree unequal types usually differ? I guess it might be most of the time fields with pointer to incomplete vs. complete type? Thanks, Richard. > Honza > > Index: ipa-devirt.c > =================================================================== > --- ipa-devirt.c (revision 264689) > +++ ipa-devirt.c (working copy) > @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert) > return val; > } > > +/* Return the main variant of the odr type. This is used for straming out > + to reduce number of type duplicates hitting the WPA->LTRANS streams. > + Do not do so when ODR violation was detected since the type may be > + structurally different then. */ > + > +tree > +prevailing_odr_type (tree t) > +{ > + t = TYPE_MAIN_VARIANT (t); > + /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE. > + We can not merge these because this does not honnor precision and > + signedness. */ > + if (!type_with_linkage_p (t) > + || type_in_anonymous_namespace_p (t) > + || TREE_CODE (t) == INTEGER_TYPE > + || !COMPLETE_TYPE_P (t)) > + return t; > + odr_type ot = get_odr_type (t, true); > + if (!ot || !ot->odr_violated) > + return ot->type; > + return t; > +} > + > /* Add TYPE od ODR type hash. */ > > void > Index: ipa-utils.h > =================================================================== > --- ipa-utils.h (revision 264689) > +++ ipa-utils.h (working copy) > @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree > location_t loc2 = UNKNOWN_LOCATION); > bool odr_or_derived_type_p (const_tree t); > bool odr_types_equivalent_p (tree type1, tree type2); > +tree prevailing_odr_type (tree t); > > /* Return vector containing possible targets of polymorphic call E. > If COMPLETEP is non-NULL, store true if the list is complete. > Index: lto/lto.c > =================================================================== > --- lto/lto.c (revision 264689) > +++ lto/lto.c (working copy) > @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t > static void > gimple_register_canonical_type (tree t) > { > + if (flag_checking) > + verify_type (t); > if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t) > || !canonical_type_used_p (t)) > return; > Index: lto-streamer-out.c > =================================================================== > --- lto-streamer-out.c (revision 264689) > +++ lto-streamer-out.c (working copy) > @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. > #include "debug.h" > #include "omp-offload.h" > #include "print-tree.h" > +#include "attribs.h" > +#include "ipa-utils.h" > > > static void lto_write_tree (struct output_block*, tree, bool); > @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc > free (ob); > } > > +/* Do same comparsion as check_qualified_type skipping lang part of type > + and be more permissive about type names: we only care that names are > + same (for diagnostics) and that ODR names are the same. */ > > -/* Look up NODE in the type table and write the index for it to OB. */ > +static bool > +types_equal_p (tree t, tree v) > +{ > + if (TYPE_QUALS (t) != TYPE_QUALS (v)) > + return false; > + > + if (TYPE_NAME (t) != TYPE_NAME (v) > + && (!TYPE_NAME (t) || !TYPE_NAME (v) > + || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL > + || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL > + || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t)) > + != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v)) > + || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v)))) > + return false; > + > + if (TYPE_ALIGN (t) != TYPE_ALIGN (v)) > + return false; > + > + if (!attribute_list_equal (TYPE_ATTRIBUTES (t), > + TYPE_ATTRIBUTES (v))) > + return false; > + > + /* Do not replace complete type by incomplete. */ > + if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v)) > + return false; > + > + if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v)) > + return false; > + > + gcc_assert (TREE_CODE (t) == TREE_CODE (v)); > + > + /* For pointer types and array types we also care about the type they > + reffer to. */ > + if (TREE_TYPE (t)) > + return types_equal_p (TREE_TYPE (t), TREE_TYPE (v)); > + > + return true; > +} > + > +/* Search list of type variants and look up first one that looks same. > + At compile time, this removes duplicates of types created by front-ends. > + At WPA time we also merge duplicated ODR types. */ > + > +static tree > +first_compatible_type_variant (tree t) > +{ > + tree first = NULL; > + if (POINTER_TYPE_P (t)) > + { > + tree t2 = first_compatible_type_variant (TREE_TYPE (t)); > + if (t2 != TREE_TYPE (t)) > + { > + if (TREE_CODE (t) == POINTER_TYPE) > + first = build_pointer_type_for_mode (t2, TYPE_MODE (t), > + TYPE_REF_CAN_ALIAS_ALL (t)); > + else > + first = build_reference_type_for_mode (t2, TYPE_MODE (t), > + TYPE_REF_CAN_ALIAS_ALL (t)); > + } > + else > + first = TYPE_MAIN_VARIANT (t); > + gcc_assert (TYPE_MAIN_VARIANT (first) == first); > + } > + else > + first = prevailing_odr_type (t); > + > + gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, > false)); > + > + for (tree v = first; v; v = TYPE_NEXT_VARIANT (v)) > + if (types_equal_p (t, v)) > + { > + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false)); > + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL > (v)); > + gcc_checking_assert (get_alias_set (t) == get_alias_set (v)); > + return v; > + } > + > + /* Most of the time we will find v itself in the list. With ODR type > merging > + we however merge across TYPE_MAIN_VARIANTs and in that case we may not > + have corresponding type in the list. */ > + tree v = build_variant_type_copy (first); > + TYPE_READONLY (v) = TYPE_READONLY (t); > + TYPE_VOLATILE (v) = TYPE_VOLATILE (t); > + TYPE_ATOMIC (v) = TYPE_ATOMIC (t); > + TYPE_RESTRICT (v) = TYPE_RESTRICT (t); > + TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t); > + SET_TYPE_ALIGN (v, TYPE_ALIGN (t)); > + TYPE_NAME (v) = TYPE_NAME (t); > + TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t); > + TREE_TYPE (v) = TREE_TYPE (t); > + gcc_checking_assert (types_equal_p (t, v)); > + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false)); > + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL > (v)); > + gcc_checking_assert (get_alias_set (t) == get_alias_set (v)); > + return v; > +} > + > + > +/* Look up NODE in the type table and write the index for it to OB. > + Eliminate unnecesary type variants. */ > > static void > output_type_ref (struct output_block *ob, tree node) > { > + bool existed_p; > + unsigned int &index > + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table-> > + get_or_insert (node, &existed_p); > streamer_write_record_start (ob, LTO_type_ref); > - lto_output_type_ref_index (ob->decl_state, ob->main_stream, node); > + > + if (existed_p) > + { > + streamer_write_uhwi_stream (ob->main_stream, index); > + return; > + } > + > + tree node2 = first_compatible_type_variant (node); > + > + /* If NODE is first variant, just add it into encoder. */ > + if (node2 == node) > + { > + index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length (); > + if (streamer_dump_file) > + { > + print_node_brief (streamer_dump_file, > + " Encoding indexable ", node, 4); > + fprintf (streamer_dump_file, " as %i\n", index); > + } > + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node); > + streamer_write_uhwi_stream (ob->main_stream, index); > + return; > + } > + > + index = 0xdead; > + > + if (streamer_dump_file) > + { > + print_node_brief (streamer_dump_file, " Type ", node, 4); > + fprintf (streamer_dump_file, " prevailed by "); > + print_node_brief (streamer_dump_file, "", node2, 4); > + fprintf (streamer_dump_file, "\n"); > + } > + > + /* See if we already assgined one to the first variant. If that is the > case > + only duplicate the entry in the hashtable so we do not need to call > + first_compatible_type_variant redundantly. */ > + unsigned int &index2 > + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table-> > + get_or_insert (node2, &existed_p); > + /* The reference may be changed by hash table expansion. */ > + unsigned int i = index2; > + > + if (existed_p) > + { > + ob->decl_state->streams[LTO_DECL_STREAM_TYPE]. > + tree_hash_table->get_or_insert (node, &existed_p) = i; > + streamer_write_uhwi_stream (ob->main_stream, i); > + return; > + } > + else > + { > + index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length (); > + i = index2; > + if (streamer_dump_file) > + { > + print_node_brief (streamer_dump_file, > + " Encoding indexable ", node2, 4); > + fprintf (streamer_dump_file, " as %i \n", i); > + } > + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2); > + /* We need to search again to watch hash table resize. */ > + ob->decl_state->streams[LTO_DECL_STREAM_TYPE]. > + tree_hash_table->get_or_insert (node, &existed_p) = i; > + gcc_assert (existed_p); > + streamer_write_uhwi_stream (ob->main_stream, i); > + } > + return; > } > > > @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d > #define visit(SIBLING) \ > do { \ > unsigned ix; \ > - if (!SIBLING) \ > + tree t2 = SIBLING; \ > + if (t2 && TYPE_P (t2)) \ > + t2 = first_compatible_type_variant (t2); \ > + if (!t2) \ > hstate.add_int (0); \ > - else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \ > + else if (streamer_tree_cache_lookup (cache, t2, &ix)) \ > hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \ > else if (map) \ > - hstate.add_int (*map->get (SIBLING)); \ > + hstate.add_int (*map->get (t2)); \ > else \ > hstate.add_int (1); \ > } while (0) > @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block > if (this_ref_p && tree_is_indexable (expr)) > return; > > + /* Replace type by its prevailing variant. */ > + if (TYPE_P (expr)) > + expr = first_compatible_type_variant (expr); > + > /* Check if we already streamed EXPR. */ > if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL)) > return; > @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob > return; > } > > + if (TYPE_P (expr)) > + expr = first_compatible_type_variant (expr); > + > + gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr)); > + > existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix); > if (existed_p) > { > @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab > gcc_assert (lto_tree_ref_encoder_size (encoder) == 0); > encoder->trees.reserve_exact (n); > for (j = 0; j < n; j++) > - encoder->trees.safe_push ((*trees)[j]); > + { > + tree t = (*trees)[j]; > + if (TYPE_P (t)) > + t = first_compatible_type_variant (t); > + if (streamer_dump_file) > + { > + print_node_brief (streamer_dump_file, > + " Adding reference to ", t, 4); > + fprintf (streamer_dump_file, " as %i \n", (int)j); > + } > + encoder->trees.safe_push (t); > + } > } > > lto_free_raw_section_data (file_data, LTO_section_function_body, name, > @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block > print_node_brief (streamer_dump_file, "", t, 4); > fprintf (streamer_dump_file, "\n"); > } > + gcc_checking_assert (!TYPE_P (t) > + || t == first_compatible_type_variant (t)); > if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL)) > stream_write_tree (ob, t, false); > } > @@ -2537,6 +2755,8 @@ write_global_references (struct output_b > t = lto_tree_ref_encoder_get_tree (encoder, index); > streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num); > gcc_assert (slot_num != (unsigned)-1); > + gcc_checking_assert (!TYPE_P (t) > + || t == first_compatible_type_variant (t)); > data[index + 1] = slot_num; > } > > Index: tree-ssa-alias.c > =================================================================== > --- tree-ssa-alias.c (revision 264689) > +++ tree-ssa-alias.c (working copy) > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. > #include "tree-dfa.h" > #include "ipa-reference.h" > #include "varasm.h" > +#include "ipa-utils.h" > > /* Broad overview of how alias analysis on gimple works: > > @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p > ??? Bitfields can overlap at RTL level so punt on them. */ > if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) > return false; > + /* In LTO we may end up merging different variants of ODR types > + but keep corresponding field decls unmerged. */ > + if (in_lto_p > + && type_with_linkage_p (type1) > + && DECL_NAME (field1) == DECL_NAME (field2)) > + return false; > return true; > } > } > @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t > { > /* We're left with accessing different fields of a structure, > no possible overlap. */ > - if (fieldx != fieldy) > + if (fieldx != fieldy > + && (in_lto_p || !type_with_linkage_p (typex) > + || DECL_NAME (fieldx) == DECL_NAME (fieldy))) > { > /* A field and its representative need to be considered the > same. */ > Index: tree.c > =================================================================== > --- tree.c (revision 264689) > +++ tree.c (working copy) > @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void) > > /* Traverse every type found freeing its language data. */ > FOR_EACH_VEC_ELT (fld.types, i, t) > - free_lang_data_in_type (t); > + { > + free_lang_data_in_type (t); > + while (TYPE_NEXT_VARIANT (t) > + && !fld.pset.contains (TYPE_NEXT_VARIANT (t))) > + TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t)); > + } > if (flag_checking) > { > FOR_EACH_VEC_ELT (fld.types, i, t) > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)