On Fri, 11 Jul 2014, Jan Hubicka wrote: > Hi, > since we both agreed offlining constructors from global decl stream is a good > idea, I went ahead and implemented it. I would like to followup by an > cleanups; for example the sections are still tagged as function sections, but > I > would like to do it incrementally. There is quite some uglyness in the way we > handle function sections and the patch started to snowball very quickly. > > The patch conceptually copies what we do for functions and re-uses most of > infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of > getting function in) and it is used by output machinery, by ipa-visibility > while rewritting the constructor and by ctor_for_folding (which makes us to > load the ctor whenever it is needed by ipa-cp or ipa-devirt). > > I kept get_symbol_initial_value as an authority to decide if we want to encode > given constructor or not. The section itself for trivial ctor is about 25 > bytes and with header it is probably close to double of it. Currently the > heuristic > is to offline only constructors that are CONSTRUCTOR and keep simple > expressions > inline. We may want to tweak it.
Hmm, so what about artificial testcase with gazillions of struct X { int i; }; struct X a0001 = { 1 }; struct X a0002 = { 2 }; .... how does it explode LTO IL size and streaming time (compile-out and LTRANS in)? I suppose it still helps WPA stage. Also what we desparately miss is to put CONST_DECLs into the symbol table (and thus eventually move the constant pool to symtab). That and no longer allowing STRING_CSTs in the IL but only CONST_DECLs with STRING_CST initializers (to fix PR50199). > The patch does not bring miraculous savings to firefox WPA, but it does some: > > GGC memory after global stream is read goes from 1376898k to 1250533k > overall GGC allocations from 4156478 kB to 4012462 kB > read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average > size 2.037867 > 20997206 tree bodies read in total -> 18584194 tree bodies read in total > Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section > decls: 271557265 bytes > Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section > function_body: 7548680 bytes > > Things would be better if ipa-visibility and ipa-devirt did not load most of > the virtual tables into memory (still better than loading each into memory 20 > times at average). I will work on that incrementally. We load 10311 ctors > into > memory at WPA time. > > Note that firefox seems to feature really huge data segment these days. > http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html > > Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap > in progress, OK? The patch looks ok to me. How about simply doing s/LTO_section_function_body/LTO_section_symbol_content/ instead of adding LTO_section_variable_initializer? Thanks, Richard. > * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h > (varpool_get_constructor): New function. > (ctor_for_folding): Use it. > (varpool_assemble_decl): Likewise. > * lto-streamer.h (struct output_block): Turn cgraph_node > to symbol filed. > (lto_input_variable_constructor): Declare. > * ipa-visibility.c (function_and_variable_visibility): Use > varpool_get_constructor. > * cgraph.h (varpool_get_constructor): Declare. > * lto-streamer-out.c (get_symbol_initial_value): Take encoder > parameter; return error_mark_node for non-trivial constructors. > (lto_write_tree_1, DFS_write_tree): UPdate use of > get_symbol_initial_value. > (output_function): Update initialization of symbol. > (output_constructor): New function. > (copy_function): Rename to .. > (copy_function_or_variable): ... this one; handle vars too. > (lto_output): Output variable sections. > * lto-streamer-in.c (input_constructor): New function. > (lto_read_body): Rename from ... > (lto_read_body_or_constructor): ... this one; handle vars > too. > (lto_input_variable_constructor): New function. > * ipa-prop.c (ipa_prop_write_jump_functions, > ipa_prop_write_all_agg_replacement): Update. > Index: varpool.c > =================================================================== > --- varpool.c (revision 212426) > +++ varpool.c (working copy) > @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3. > #include "gimple-expr.h" > #include "flags.h" > #include "pointer-set.h" > +#include "tree-ssa-alias.h" > +#include "gimple.h" > +#include "lto-streamer.h" > > const char * const tls_model_names[]={"none", "tls-emulated", "tls-real", > "tls-global-dynamic", "tls-local-dynamic", > @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname) > return NULL; > } > > +/* When doing LTO, read NODE's constructor from disk if it is not already > present. */ > + > +tree > +varpool_get_constructor (struct varpool_node *node) > +{ > + struct lto_file_decl_data *file_data; > + const char *data, *name; > + size_t len; > + tree decl = node->decl; > + > + if (DECL_INITIAL (node->decl) != error_mark_node > + || !in_lto_p) > + return DECL_INITIAL (node->decl); > + > + file_data = node->lto_file_data; > + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + > + /* We may have renamed the declaration, e.g., a static function. */ > + name = lto_get_decl_name_mapping (file_data, name); > + > + data = lto_get_section_data (file_data, LTO_section_function_body, > + name, &len); > + if (!data) > + fatal_error ("%s: section %s is missing", > + file_data->file_name, > + name); > + > + lto_input_variable_constructor (file_data, node, data); > + lto_stats.num_function_bodies++; > + lto_free_section_data (file_data, LTO_section_function_body, name, > + data, len); > + lto_free_function_in_decl_state_for_node (node); > + return DECL_INITIAL (node->decl); > +} > + > /* Return if DECL is constant and its initial value is known (so we can do > constant folding using DECL_INITIAL (decl)). > Return ERROR_MARK_NODE when value is unknown. */ > @@ -314,6 +352,9 @@ ctor_for_folding (tree decl) > if (DECL_VIRTUAL_P (real_decl)) > { > gcc_checking_assert (TREE_READONLY (real_decl)); > + if (DECL_INITIAL (real_decl) == error_mark_node > + && (node = varpool_get_node (real_decl))) > + return varpool_get_constructor (node); > if (DECL_INITIAL (real_decl)) > return DECL_INITIAL (real_decl); > else > @@ -349,6 +390,9 @@ ctor_for_folding (tree decl) > > ??? Previously we behaved so for scalar variables but not for array > accesses. */ > + if (DECL_INITIAL (real_decl) == error_mark_node > + && (node = varpool_get_node (real_decl))) > + return varpool_get_constructor (node); > return DECL_INITIAL (real_decl); > } > > @@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod > if (!node->in_other_partition > && !DECL_EXTERNAL (decl)) > { > + varpool_get_constructor (node); > assemble_variable (decl, 0, 1, 0); > gcc_assert (TREE_ASM_WRITTEN (decl)); > node->definition = true; > Index: lto-streamer.h > =================================================================== > --- lto-streamer.h (revision 212426) > +++ lto-streamer.h (working copy) > @@ -685,9 +685,9 @@ struct output_block > far and the indexes assigned to them. */ > hash_table<string_slot_hasher> *string_hash_table; > > - /* The current cgraph_node that we are currently serializing. Null > + /* The current symbol that we are currently serializing. Null > if we are serializing something else. */ > - struct cgraph_node *cgraph_node; > + struct symtab_node *symbol; > > /* These are the last file and line that were seen in the stream. > If the current node differs from these, it needs to insert > @@ -830,6 +830,9 @@ extern void lto_reader_init (void); > extern void lto_input_function_body (struct lto_file_decl_data *, > struct cgraph_node *, > const char *); > +extern void lto_input_variable_constructor (struct lto_file_decl_data *, > + struct varpool_node *, > + const char *); > extern void lto_input_constructors_and_inits (struct lto_file_decl_data *, > const char *); > extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int); > Index: ipa-visibility.c > =================================================================== > --- ipa-visibility.c (revision 212426) > +++ ipa-visibility.c (working copy) > @@ -686,6 +686,8 @@ function_and_variable_visibility (bool w > if (found) > { > struct pointer_set_t *visited_nodes = pointer_set_create (); > + > + varpool_get_constructor (vnode); > walk_tree (&DECL_INITIAL (vnode->decl), > update_vtable_references, NULL, visited_nodes); > pointer_set_destroy (visited_nodes); > Index: cgraph.h > =================================================================== > --- cgraph.h (revision 212426) > +++ cgraph.h (working copy) > @@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree); > void symtab_initialize_asm_name_hash (void); > void symtab_prevail_in_asm_name_hash (symtab_node *node); > void varpool_remove_initializer (varpool_node *); > +tree varpool_get_constructor (struct varpool_node *node); > > /* In cgraph.c */ > extern void change_decl_assembler_name (tree, tree); > Index: lto-streamer-out.c > =================================================================== > --- lto-streamer-out.c (revision 212426) > +++ lto-streamer-out.c (working copy) > @@ -318,7 +319,7 @@ lto_is_streamable (tree expr) > /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL. > */ > > static tree > -get_symbol_initial_value (struct output_block *ob, tree expr) > +get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr) > { > gcc_checking_assert (DECL_P (expr) > && TREE_CODE (expr) != FUNCTION_DECL > @@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_ > && !DECL_IN_CONSTANT_POOL (expr) > && initial) > { > - lto_symtab_encoder_t encoder; > varpool_node *vnode; > - > - encoder = ob->decl_state->symtab_node_encoder; > - vnode = varpool_get_node (expr); > - if (!vnode > - || !lto_symtab_encoder_encode_initializer_p (encoder, > - vnode)) > - initial = error_mark_node; > + /* Extra section needs about 30 bytes; do not produce it for simple > + scalar values. */ > + if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR > + || !(vnode = varpool_get_node (expr)) > + || !lto_symtab_encoder_encode_initializer_p (encoder, vnode)) > + initial = error_mark_node; > } > > return initial; > @@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o > && TREE_CODE (expr) != TRANSLATION_UNIT_DECL) > { > /* Handle DECL_INITIAL for symbols. */ > - tree initial = get_symbol_initial_value (ob, expr); > + tree initial = get_symbol_initial_value > + (ob->decl_state->symtab_node_encoder, expr); > stream_write_tree (ob, initial, ref_p); > } > } > @@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob, > && TREE_CODE (expr) != TRANSLATION_UNIT_DECL) > { > /* Handle DECL_INITIAL for symbols. */ > - tree initial = get_symbol_initial_value (ob, expr); > + tree initial = get_symbol_initial_value > (ob->decl_state->symtab_node_encoder, > + expr); > DFS_write_tree (ob, cstate, initial, ref_p, ref_p); > } > } > @@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod > ob = create_output_block (LTO_section_function_body); > > clear_line_info (ob); > - ob->cgraph_node = node; > + ob->symbol = node; > > gcc_assert (current_function_decl == NULL_TREE && cfun == NULL); > > @@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod > destroy_output_block (ob); > } > > +/* Output the body of function NODE->DECL. */ > + > +static void > +output_constructor (struct varpool_node *node) > +{ > + tree var = node->decl; > + struct output_block *ob; > + > + ob = create_output_block (LTO_section_function_body); > + > + clear_line_info (ob); > + ob->symbol = node; > + > + /* Make string 0 be a NULL string. */ > + streamer_write_char_stream (ob->string_stream, 0); > + > + /* Output DECL_INITIAL for the function, which contains the tree of > + lexical scopes. */ > + stream_write_tree (ob, DECL_INITIAL (var), true); > + > + /* Create a section to hold the pickled output of this function. */ > + produce_asm (ob, var); > + > + destroy_output_block (ob); > +} > + > > /* Emit toplevel asms. */ > > @@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void) > } > > > -/* Copy the function body of NODE without deserializing. */ > +/* Copy the function body or variable constructor of NODE without > deserializing. */ > > static void > -copy_function (struct cgraph_node *node) > +copy_function_or_variable (struct symtab_node *node) > { > tree function = node->decl; > struct lto_file_decl_data *file_data = node->lto_file_data; > @@ -2072,7 +2190,7 @@ lto_output (void) > if (gimple_has_body_p (node->decl) || !flag_wpa) > output_function (node); > else > - copy_function (node); > + copy_function_or_variable (node); > gcc_assert (lto_get_out_decl_state () == decl_state); > lto_pop_out_decl_state (); > lto_record_function_out_decl_state (node->decl, decl_state); > @@ -2085,6 +2203,25 @@ lto_output (void) > tree ctor = DECL_INITIAL (node->decl); > if (ctor && !in_lto_p) > walk_tree (&ctor, wrap_refs, NULL, NULL); > + if (get_symbol_initial_value (encoder, node->decl) == error_mark_node > + && lto_symtab_encoder_encode_initializer_p (encoder, node) > + && !node->alias) > + { > +#ifdef ENABLE_CHECKING > + gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl))); > + bitmap_set_bit (output, DECL_UID (node->decl)); > +#endif > + decl_state = lto_new_out_decl_state (); > + lto_push_out_decl_state (decl_state); > + if (DECL_INITIAL (node->decl) != error_mark_node > + || !flag_wpa) > + output_constructor (node); > + else > + copy_function_or_variable (node); > + gcc_assert (lto_get_out_decl_state () == decl_state); > + lto_pop_out_decl_state (); > + lto_record_function_out_decl_state (node->decl, decl_state); > + } > } > } > > Index: lto-streamer-in.c > =================================================================== > --- lto-streamer-in.c (revision 212426) > +++ lto-streamer-in.c (working copy) > @@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat > pop_cfun (); > } > > +/* Read the body of function FN_DECL from DATA_IN using input block IB. */ > + > +static void > +input_constructor (tree var, struct data_in *data_in, > + struct lto_input_block *ib) > +{ > + DECL_INITIAL (var) = stream_read_tree (ib, data_in); > +} > + > > /* Read the body from DATA for function NODE and fill it in. > FILE_DATA are the global decls and types. SECTION_TYPE is either > @@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat > that function. */ > > static void > -lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node > *node, > - const char *data, enum lto_section_type section_type) > +lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct > symtab_node *node, > + const char *data, enum lto_section_type > section_type) > { > const struct lto_function_header *header; > struct data_in *data_in; > @@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data > tree fn_decl = node->decl; > > header = (const struct lto_function_header *) data; > - cfg_offset = sizeof (struct lto_function_header); > - main_offset = cfg_offset + header->cfg_size; > - string_offset = main_offset + header->main_size; > - > - LTO_INIT_INPUT_BLOCK (ib_cfg, > - data + cfg_offset, > - 0, > - header->cfg_size); > - > - LTO_INIT_INPUT_BLOCK (ib_main, > - data + main_offset, > - 0, > - header->main_size); > + if (TREE_CODE (node->decl) == FUNCTION_DECL) > + { > + cfg_offset = sizeof (struct lto_function_header); > + main_offset = cfg_offset + header->cfg_size; > + string_offset = main_offset + header->main_size; > + > + LTO_INIT_INPUT_BLOCK (ib_cfg, > + data + cfg_offset, > + 0, > + header->cfg_size); > + > + LTO_INIT_INPUT_BLOCK (ib_main, > + data + main_offset, > + 0, > + header->main_size); > + } > + else > + { > + main_offset = sizeof (struct lto_function_header); > + string_offset = main_offset + header->main_size; > + > + LTO_INIT_INPUT_BLOCK (ib_main, > + data + main_offset, > + 0, > + header->main_size); > + } > > data_in = lto_data_in_create (file_data, data + string_offset, > header->string_size, vNULL); > @@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data > > /* Set up the struct function. */ > from = data_in->reader_cache->nodes.length (); > - input_function (fn_decl, data_in, &ib_main, &ib_cfg); > + if (TREE_CODE (node->decl) == FUNCTION_DECL) > + input_function (fn_decl, data_in, &ib_main, &ib_cfg); > + else > + input_constructor (fn_decl, data_in, &ib_main); > /* And fixup types we streamed locally. */ > { > struct streamer_tree_cache_d *cache = data_in->reader_cache; > @@ -1124,7 +1149,17 @@ void > lto_input_function_body (struct lto_file_decl_data *file_data, > struct cgraph_node *node, const char *data) > { > - lto_read_body (file_data, node, data, LTO_section_function_body); > + lto_read_body_or_constructor (file_data, node, data, > LTO_section_function_body); > +} > + > +/* Read the body of NODE using DATA. FILE_DATA holds the global > + decls and types. */ > + > +void > +lto_input_variable_constructor (struct lto_file_decl_data *file_data, > + struct varpool_node *node, const char *data) > +{ > + lto_read_body_or_constructor (file_data, node, data, > LTO_section_function_body); > } > > > Index: ipa-prop.c > =================================================================== > --- ipa-prop.c (revision 212426) > +++ ipa-prop.c (working copy) > @@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void) > > ob = create_output_block (LTO_section_jump_functions); > encoder = ob->decl_state->symtab_node_encoder; > - ob->cgraph_node = NULL; > + ob->symbol = NULL; > for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); > lsei_next_function_in_partition (&lsei)) > { > @@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void > > ob = create_output_block (LTO_section_ipcp_transform); > encoder = ob->decl_state->symtab_node_encoder; > - ob->cgraph_node = NULL; > + ob->symbol = NULL; > for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); > lsei_next_function_in_partition (&lsei)) > { > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer