On Sun, Oct 6, 2019 at 4:38 PM Jan Hubicka <hubi...@ucw.cz> wrote: > > > On 9/19/19 2:33 AM, Martin Liška wrote: > > > Hi. > > > > > > Function reordering has been around for quite some time and a naive > > > implementation was also part of my diploma thesis some time ago. > > > Currently, the GCC can reorder function based on first execution, which > > > happens with PGO and LTO of course. Known limitation is that the order > > > is preserved only partially as various symbols go into different LTRANS > > > partitions. > > > > > > There has been some research in the area and I would point out the > > > Facebook paper > > > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new > > > implementation > > > in the GCC that does the same (in a proper way). First part of the > > > enablement are patches > > > to ld.bfd and ld.gold that come up with a new section .text.sorted, that > > > is always sorted. > > > There's a snippet from the modified default linker script: > > Funny, I was doing this via linker scripts circa ~95, in fact that's why > > we have -ffunction-sections :-) We started with scripts which > > post-processed profiling data to create linker scripts for ELF systems. > > We had something for HPUX/SOM as well, but I can't remember what > > mechanism we used, it may have been a gross level sorting using the SOM > > section sort key mechanism - it only had 128 or 256 keys with a notable > > amount of them reserved. > > > > We had also built a linker with a basic level of interposition circa > > 1993 and explored various approaches to reordering executables. I'd > > just joined the group at the time and was responsible for wiring up > > stuff on the OS side, but eventually got the "pleasure" of owning the > > linker server. A lot of the C3 algorithmic stuff looks similar to what > > we did. > > For reference I am attaching early LTO version from circa 2010 :) > > > > Anyway... > > > > I don't see anything objectionable in here. It's noted as an RFC. Are > > you interested in pushing this forward for gcc-10? > > I think it is plan to get some form of code layout pass into GCC10. I > will test Martin's version on Firefox and see if it have any effect > here. It is generally quite hard to evaluate those changes (and it is > reason why I did not moved forward with version form 2010 - more > precisely it kept falling off the shelf for about a decade) > > If LLD has support for cgraph annotations and we support LLD, i think we > should add support for that, too - it will be very useful to compare > indvidiual implementations. > I believe there is gold support (off tree?) for that too and something > similar is also supported by other toolchains like AIX? > > One problem with the sections approach is that every section gets > aligned to largest code alignment inside of the section. Since our > alignments are (should be) often cache line based we get cca 30 bytes of > waste for every function that is quite a lot. > > This is why we currently have way to order function when outputting them > and use that with FDO (using Martin's first execution logic). This has > drwarback of making the functions to flow in that order through late > optimizations and RTL backend and thus we lose IPA-RA and some > IP propagation (late pure/const/nothrow discovery).
But you can also fix that by the parallelization GSoC project approach, decoupling things at RTL expansion IPA-wise and using output order for the latter (or even do the fragments in GCC itself by refactoring the output machinery to use function-local "strings", assembling the final file later). Refactoring output to handle multiple output "files" at the same time might also help getting rid of that early-LTO-debug copying stuff (now that it seems that linker-plugin extensions for partially claiming files will never happen...) > So this approach has a drawback, too. It is why i was trying to push > myself to get gcc to use gas fragments :) > > Anyway, all these issues can be sovled incementally - lets see how > Maritn's patch work on Firefox and if we can get it tested elsewhere and > start from that. > > I will take a look into the details. > > Honza > > > > jeff > > > > Index: tree-pass.h > =================================================================== > *** tree-pass.h (revision 164689) > --- tree-pass.h (working copy) > *************** extern struct ipa_opt_pass_d pass_ipa_lt > *** 467,472 **** > --- 467,473 ---- > extern struct ipa_opt_pass_d pass_ipa_lto_finish_out; > extern struct ipa_opt_pass_d pass_ipa_profile; > extern struct ipa_opt_pass_d pass_ipa_cdtor_merge; > + extern struct ipa_opt_pass_d pass_ipa_func_reorder; > > extern struct gimple_opt_pass pass_all_optimizations; > extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing; > Index: cgraphunit.c > =================================================================== > *** cgraphunit.c (revision 164689) > --- cgraphunit.c (working copy) > *************** cgraph_inline_p (struct cgraph_edge *e, > *** 1517,1522 **** > --- 1517,1529 ---- > return !e->inline_failed; > } > > + static int > + node_cmp (const void *pa, const void *pb) > + { > + const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; > + const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; > + return b->order - a->order; > + } > > > /* Expand all functions that must be output. > *************** cgraph_expand_all_functions (void) > *** 1546,1551 **** > --- 1553,1561 ---- > if (order[i]->process) > order[new_order_pos++] = order[i]; > > + if (flag_ipa_reorder) > + qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp); > + > for (i = new_order_pos - 1; i >= 0; i--) > { > node = order[i]; > Index: tree-ssa-ccp.c > =================================================================== > *** tree-ssa-ccp.c (revision 164689) > --- tree-ssa-ccp.c (working copy) > *************** ccp_fold_stmt (gimple_stmt_iterator *gsi > *** 2307,2312 **** > --- 2307,2324 ---- > changed = true; > } > } > + if (TREE_CODE (gimple_call_fn (stmt)) == OBJ_TYPE_REF) > + { > + tree expr = OBJ_TYPE_REF_EXPR (gimple_call_fn (stmt)); > + expr = valueize_op (expr); > + if (TREE_CODE (expr) == ADDR_EXPR > + && TREE_CODE (TREE_OPERAND (expr, 0)) == FUNCTION_DECL) > + { > + gimple_call_set_fndecl (stmt, TREE_OPERAND (expr, 0)); > + debug_gimple_stmt (stmt); > + changed = true; > + } > + } > > return changed; > } > Index: lto-cgraph.c > =================================================================== > *** lto-cgraph.c (revision 164689) > --- lto-cgraph.c (working copy) > *************** lto_output_node (struct lto_simple_outpu > *** 466,471 **** > --- 466,473 ---- > > if (tag == LTO_cgraph_analyzed_node) > { > + lto_output_uleb128_stream (ob->main_stream, > + node->order); > lto_output_sleb128_stream (ob->main_stream, > > node->local.inline_summary.estimated_self_stack_size); > lto_output_sleb128_stream (ob->main_stream, > *************** input_overwrite_node (struct lto_file_de > *** 927,932 **** > --- 929,935 ---- > struct cgraph_node *node, > enum LTO_cgraph_tags tag, > struct bitpack_d *bp, > + int order, > unsigned int stack_size, > unsigned int self_time, > unsigned int time_inlining_benefit, > *************** input_overwrite_node (struct lto_file_de > *** 935,940 **** > --- 938,944 ---- > enum ld_plugin_symbol_resolution resolution) > { > node->aux = (void *) tag; > + node->order = order; > node->local.inline_summary.estimated_self_stack_size = stack_size; > node->local.inline_summary.self_time = self_time; > node->local.inline_summary.time_inlining_benefit = time_inlining_benefit; > *************** input_node (struct lto_file_decl_data *f > *** 1027,1032 **** > --- 1031,1037 ---- > unsigned long same_body_count = 0; > int clone_ref; > enum ld_plugin_symbol_resolution resolution; > + int order = 0; > > clone_ref = lto_input_sleb128 (ib); > > *************** input_node (struct lto_file_decl_data *f > *** 1045,1050 **** > --- 1050,1056 ---- > > if (tag == LTO_cgraph_analyzed_node) > { > + order = lto_input_uleb128 (ib); > stack_size = lto_input_sleb128 (ib); > self_size = lto_input_sleb128 (ib); > size_inlining_benefit = lto_input_sleb128 (ib); > *************** input_node (struct lto_file_decl_data *f > *** 1066,1072 **** > > bp = lto_input_bitpack (ib); > resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib); > ! input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time, > time_inlining_benefit, self_size, > size_inlining_benefit, resolution); > > --- 1072,1078 ---- > > bp = lto_input_bitpack (ib); > resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib); > ! input_overwrite_node (file_data, node, tag, &bp, order, stack_size, > self_time, > time_inlining_benefit, self_size, > size_inlining_benefit, resolution); > > Index: gimple-fold.c > =================================================================== > *** gimple-fold.c (revision 164689) > --- gimple-fold.c (working copy) > *************** gimple_fold_obj_type_ref (tree ref, tree > *** 1465,1470 **** > --- 1465,1480 ---- > tree known_binfo = known_type ? TYPE_BINFO (known_type) : NULL_TREE; > tree binfo; > > + if (TREE_CODE (obj) == SSA_NAME) > + { > + gimple def_stmt = SSA_NAME_DEF_STMT (obj); > + if (def_stmt > + && gimple_code (def_stmt) == GIMPLE_ASSIGN > + && (gimple_assign_single_p (def_stmt) > + || (gimple_assign_unary_nop_p (def_stmt)))) > + obj = gimple_assign_rhs1 (def_stmt); > + } > + > if (TREE_CODE (obj) == ADDR_EXPR) > obj = TREE_OPERAND (obj, 0); > > *************** fold_gimple_call (gimple_stmt_iterator * > *** 1513,1520 **** > copying EH region info to the new node. Easier to just do it > here where we can just smash the call operand. */ > callee = gimple_call_fn (stmt); > ! if (TREE_CODE (callee) == OBJ_TYPE_REF > ! && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR) > { > tree t; > > --- 1523,1529 ---- > copying EH region info to the new node. Easier to just do it > here where we can just smash the call operand. */ > callee = gimple_call_fn (stmt); > ! if (TREE_CODE (callee) == OBJ_TYPE_REF) > { > tree t; > > Index: common.opt > =================================================================== > *** common.opt (revision 164689) > --- common.opt (working copy) > *************** fdse > *** 1107,1112 **** > --- 1107,1116 ---- > Common Var(flag_dse) Init(1) Optimization > Use the RTL dead store elimination pass > > + freorder-functions > + Common Report Var(flag_ipa_reorder) Init(1) Optimization > + Reroder functions for better locality at execution time > + > freschedule-modulo-scheduled-loops > Common Report Var(flag_resched_modulo_sched) Optimization > Enable/Disable the traditional scheduling in loops that already passed > modulo scheduling > Index: ipa-reorder.c > =================================================================== > *** ipa-reorder.c (revision 0) > --- ipa-reorder.c (revision 0) > *************** > *** 0 **** > --- 1,567 ---- > + /* IPA function reordering pass. > + Copyright (C) 2010 Free Software Foundation, Inc. > + Contributed by Jan Hubicka <j...@suse.cz> > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify it under > + the terms of the GNU General Public License as published by the Free > + Software Foundation; either version 3, or (at your option) any later > + version. > + > + GCC is distributed in the hope that it will be useful, but WITHOUT ANY > + WARRANTY; without even the implied warranty of MERCHANTABILITY or > + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > + for more details. > + > + You should have received a copy of the GNU General Public License > + along with GCC; see the file COPYING3. If not see > + <http://www.gnu.org/licenses/>. */ > + > + #include "config.h" > + #include "system.h" > + #include "coretypes.h" > + #include "tm.h" > + #include "cgraph.h" > + #include "tree-pass.h" > + #include "timevar.h" > + #include "gimple.h" > + #include "ggc.h" > + #include "flags.h" > + #include "pointer-set.h" > + #include "target.h" > + #include "tree-iterator.h" > + #include "fibheap.h" > + > + struct priority > + { > + gcov_type priority; > + struct partition *src_partition; > + struct partition *dest_partition; > + fibnode_t node; > + }; > + > + typedef struct priority *priority_ptr; > + > + DEF_VEC_P(priority_ptr); > + DEF_VEC_ALLOC_P(priority_ptr,heap); > + > + struct partition > + { > + int index; > + VEC(cgraph_node_ptr, heap) *nodes; > + VEC(priority_ptr, heap) *priorities; > + VEC(priority_ptr, heap) *in_priorities; > + }; > + > + typedef struct partition *partition_ptr; > + > + DEF_VEC_P(partition_ptr); > + DEF_VEC_ALLOC_P(partition_ptr,heap); > + > + static bool > + cgraph_node_will_be_output_p (struct cgraph_node *node) > + { > + return (node->analyzed && !node->global.inlined_to); > + } > + > + static void > + account_callees (struct partition *part, > + VEC (partition_ptr, heap) *partitions, > + struct cgraph_node *node) > + { > + struct cgraph_edge *edge; > + struct priority *p; > + int i; > + struct ipa_ref *ref; > + > + for (edge = node->callees; edge; edge = edge->next_callee) > + if (edge->inline_failed) > + { > + if (!edge->callee->aux) > + { > + struct partition *dest = VEC_index (partition_ptr, partitions, > + edge->callee->uid); > + if (dest && dest != part) > + { > + p = XCNEW (struct priority); > + edge->callee->aux = p; > + p->src_partition = part; > + p->dest_partition = dest; > + VEC_safe_push (priority_ptr, heap, part->priorities, p); > + VEC_safe_push (priority_ptr, heap, dest->in_priorities, p); > + } > + else > + continue; > + } > + else > + p = (struct priority *)edge->callee->aux; > + if (edge->count) > + p->priority += edge->count + 1; > + else > + { > + int mul = 1; > + switch (edge->caller->frequency) > + { > + case NODE_FREQUENCY_UNLIKELY_EXECUTED: mul = 0; break; > + case NODE_FREQUENCY_EXECUTED_ONCE: mul = 1; break; > + case NODE_FREQUENCY_NORMAL: mul = 10; break; > + case NODE_FREQUENCY_HOT: mul = 1000; break; > + } > + p->priority += edge->frequency * mul + 1; > + } > + } > + else > + account_callees (part, partitions, edge->callee); > + > + /* Account taking address of a function as possible call with a low > priority. */ > + for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++) > + if (ref->refered_type == IPA_REF_CGRAPH) > + { > + struct cgraph_node *node2 = ipa_ref_node (ref); > + if (!node2->aux) > + { > + struct partition *dest = VEC_index (partition_ptr, partitions, > + node2->uid); > + if (dest && dest != part) > + { > + p = XCNEW (struct priority); > + node2->aux = p; > + p->src_partition = part; > + p->dest_partition = dest; > + VEC_safe_push (priority_ptr, heap, part->priorities, p); > + VEC_safe_push (priority_ptr, heap, dest->in_priorities, p); > + } > + else > + continue; > + } > + else > + p = (struct priority *)node2->aux; > + p->priority++; > + } > + } > + > + static void > + clear_callees (struct cgraph_node *node) > + { > + struct cgraph_edge *edge; > + int i; > + struct ipa_ref *ref; > + > + for (edge = node->callees; edge; edge = edge->next_callee) > + if (edge->inline_failed) > + edge->callee->aux = NULL; > + else > + clear_callees (edge->callee); > + for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++) > + if (ref->refered_type == IPA_REF_CGRAPH) > + ipa_ref_node (ref)->aux = NULL; > + } > + > + /* Compare two priority vector entries via indexes of destination > + partitions. */ > + > + static int > + priority_cmp (const void *pa, const void *pb) > + { > + const struct priority *a = *(const struct priority * const *) pa; > + const struct priority *b = *(const struct priority * const *) pb; > + /* Priorities pointing to dead partitions are removed lazily, > + so just avoid referencing dead memory. */ > + if (!a->priority) > + return ! b->priority ? 0 : -1; > + if (!b->priority) > + return 1; > + return a->dest_partition->index - b->dest_partition->index; > + } > + > + /* Compare two priority vector entries via indexes of destination > + partitions. */ > + > + static int > + in_priority_cmp (const void *pa, const void *pb) > + { > + const struct priority *a = *(const struct priority * const *) pa; > + const struct priority *b = *(const struct priority * const *) pb; > + /* Priorities pointing to dead partitions are removed lazily, > + so just avoid referencing dead memory. */ > + if (!a->priority) > + return ! b->priority ? 0 : -1; > + if (!b->priority) > + return 1; > + return a->src_partition->index - b->src_partition->index; > + } > + > + static void > + delete_priority (fibheap_t heap, struct priority *p) > + { > + p->priority = 0; > + p->src_partition = p->dest_partition = NULL; > + if (p->node) > + fibheap_delete_node (heap, p->node); > + p->node = NULL; > + } > + > + static void > + merge_partitions (VEC (partition_ptr, heap) *partitions, > + fibheap_t heap, > + struct partition *part1, > + struct partition *part2) > + { > + VEC (priority_ptr, heap) *priorities = NULL; > + VEC (priority_ptr, heap) *in_priorities = NULL; > + unsigned int i1 = 0, i2 = 0; > + struct cgraph_node *node; > + int i; > + > + /* We keep priority lists ordered by indexes of destination partitions > + to allow effective merging. */ > + qsort (VEC_address (priority_ptr, part1->priorities), > + VEC_length (priority_ptr, part1->priorities), > + sizeof (priority_ptr), > + priority_cmp); > + qsort (VEC_address (priority_ptr, part2->priorities), > + VEC_length (priority_ptr, part2->priorities), > + sizeof (priority_ptr), > + priority_cmp); > + > + /* Merge priority lists, prune out references of partition to itself. > + Assume priority lists are ordered by indexes of destination partitions > + and keep them so. */ > + while (1) > + { > + struct priority *p1, *p2; > + > + while (i1 < VEC_length (priority_ptr, part1->priorities) > + && !VEC_index (priority_ptr, part1->priorities, i1)->priority) > + i1++; > + while (i2 < VEC_length (priority_ptr, part2->priorities) > + && !VEC_index (priority_ptr, part2->priorities, i2)->priority) > + i2++; > + > + if (i1 == VEC_length (priority_ptr, part1->priorities) > + && i2 == VEC_length (priority_ptr, part2->priorities)) > + break; > + > + if (i1 < VEC_length (priority_ptr, part1->priorities) > + && (i2 == VEC_length (priority_ptr, part2->priorities) > + || (VEC_index (priority_ptr, part1->priorities, > + i1)->dest_partition->index > + < VEC_index (priority_ptr, part2->priorities, > + i2)->dest_partition->index))) > + { > + p1 = VEC_index (priority_ptr, part1->priorities, i1); > + if (p1->dest_partition != part2) > + VEC_safe_push (priority_ptr, heap, priorities, p1); > + else > + delete_priority (heap, p1); > + i1++; > + } > + else if (i2 < VEC_length (priority_ptr, part2->priorities) > + && (i1 == VEC_length (priority_ptr, part1->priorities) > + || (VEC_index (priority_ptr, part1->priorities, > + i1)->dest_partition->index > + > VEC_index (priority_ptr, part2->priorities, > + i2)->dest_partition->index))) > + { > + p2 = VEC_index (priority_ptr, part2->priorities, i2); > + if (p2->dest_partition != part1) > + { > + VEC_safe_push (priority_ptr, heap, priorities, p2); > + p2->src_partition = part1; > + } > + else > + delete_priority (heap, p2); > + i2++; > + } > + else > + { > + p1 = VEC_index (priority_ptr, part1->priorities, i1); > + p2 = VEC_index (priority_ptr, part2->priorities, i2); > + p1->priority += p2->priority; > + fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority); > + delete_priority (heap, p2); > + VEC_safe_push (priority_ptr, heap, priorities, p1); > + i1++; > + i2++; > + } > + } > + VEC_free (priority_ptr, heap, part1->priorities); > + part1->priorities = priorities; > + > + qsort (VEC_address (priority_ptr, part1->in_priorities), > + VEC_length (priority_ptr, part1->in_priorities), > + sizeof (priority_ptr), > + in_priority_cmp); > + qsort (VEC_address (priority_ptr, part2->in_priorities), > + VEC_length (priority_ptr, part2->in_priorities), > + sizeof (priority_ptr), > + in_priority_cmp); > + i1 = 0; > + i2 = 0; > + while (1) > + { > + struct priority *p1, *p2; > + while (i1 < VEC_length (priority_ptr, part1->in_priorities) > + && !VEC_index (priority_ptr, part1->in_priorities, i1)->priority) > + i1++; > + while (i2 < VEC_length (priority_ptr, part2->in_priorities) > + && !VEC_index (priority_ptr, part2->in_priorities, i2)->priority) > + i2++; > + > + if (i1 == VEC_length (priority_ptr, part1->in_priorities) > + && i2 == VEC_length (priority_ptr, part2->in_priorities)) > + break; > + > + if (i1 < VEC_length (priority_ptr, part1->in_priorities) > + && (i2 == VEC_length (priority_ptr, part2->in_priorities) > + || (VEC_index (priority_ptr, part1->in_priorities, > + i1)->src_partition->index > + < VEC_index (priority_ptr, part2->in_priorities, > + i2)->src_partition->index))) > + { > + p1 = VEC_index (priority_ptr, part1->in_priorities, i1); > + if (p1->src_partition != part2) > + VEC_safe_push (priority_ptr, heap, in_priorities, p1); > + else > + delete_priority (heap, p1); > + i1++; > + } > + else if (i2 < VEC_length (priority_ptr, part2->in_priorities) > + && (i1 == VEC_length (priority_ptr, part1->in_priorities) > + || (VEC_index (priority_ptr, part1->in_priorities, > + i1)->src_partition->index > + > VEC_index (priority_ptr, part2->in_priorities, > + i2)->src_partition->index))) > + { > + p2 = VEC_index (priority_ptr, part2->in_priorities, i2); > + if (p2->src_partition != part1) > + { > + VEC_safe_push (priority_ptr, heap, in_priorities, p2); > + p2->dest_partition = part1; > + } > + else > + delete_priority (heap, p2); > + i2++; > + } > + else > + { > + p1 = VEC_index (priority_ptr, part1->in_priorities, i1); > + p2 = VEC_index (priority_ptr, part2->in_priorities, i2); > + p1->priority += p2->priority; > + fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority); > + delete_priority (heap, p2); > + VEC_safe_push (priority_ptr, heap, in_priorities, p1); > + i1++; > + i2++; > + } > + } > + VEC_free (priority_ptr, heap, part1->in_priorities); > + part1->in_priorities = in_priorities; > + > + for (i = 0; VEC_iterate (cgraph_node_ptr, part2->nodes, i, node); i++) > + VEC_safe_push (cgraph_node_ptr, heap, part1->nodes, node); > + > + VEC_replace (partition_ptr, partitions, part2->index, NULL); > + VEC_free (priority_ptr, heap, part2->priorities); > + VEC_free (priority_ptr, heap, part2->in_priorities); > + VEC_free (cgraph_node_ptr, heap, part2->nodes); > + free (part2); > + } > + > + static void > + dump_partition (struct partition *part) > + { > + int i; > + struct cgraph_node *node; > + struct priority *p; > + fprintf (dump_file, " Partition %i:", part->index); > + for (i = 0; VEC_iterate (cgraph_node_ptr, part->nodes, i, node); i++) > + fprintf (dump_file, " %s/%i", cgraph_node_name (node), node->uid); > + fprintf (dump_file, "\n priorities:"); > + for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++) > + if (p->priority) > + { > + gcc_assert (p->src_partition == part); > + gcc_assert (p->dest_partition != part); > + fprintf (dump_file, " %i:%i", p->dest_partition->index, > + (int)p->priority); > + } > + fprintf (dump_file, "\n"); > + } > + > + static unsigned int > + ipa_func_reorder (void) > + { > + struct cgraph_node *node; > + VEC (partition_ptr, heap) *partitions = VEC_alloc (partition_ptr, heap, > cgraph_max_uid); > + int i; > + struct partition *part, *first_part = NULL; > + int freq; > + fibheap_t heap; > + > + if (!cgraph_max_uid) > + return 0; > + > + heap = fibheap_new (); > + VEC_safe_grow_cleared (partition_ptr, heap, partitions, cgraph_max_uid); > + > + for (node = cgraph_nodes; node; node = node->next) > + if (cgraph_node_will_be_output_p (node)) > + { > + part = XCNEW (struct partition); > + part->index = node->uid; > + VEC_safe_push (cgraph_node_ptr, heap, part->nodes, node); > + VEC_replace (partition_ptr, partitions, node->uid, part); > + } > + > + if (dump_file) > + fprintf (dump_file, "\n\nCreating partitions\n"); > + for (node = cgraph_nodes; node; node = node->next) > + if (cgraph_node_will_be_output_p (node)) > + { > + part = VEC_index (partition_ptr, partitions, node->uid); > + account_callees (part, partitions, node); > + clear_callees (node); > + if (dump_file) > + dump_partition (part); > + } > + for (node = cgraph_nodes; node; node = node->next) > + if (cgraph_node_will_be_output_p (node)) > + { > + struct priority *p; > + part = VEC_index (partition_ptr, partitions, node->uid); > + > + for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++) > + p->node = fibheap_insert (heap, INT_MAX - p->priority, p); > + if (dump_file) > + dump_partition (part); > + } > + > + if (dump_file) > + fprintf (dump_file, "\n\nMerging by priorities\n"); > + while (!fibheap_empty (heap)) > + { > + struct priority *p = (struct priority *)fibheap_extract_min (heap); > + struct partition *part = p->src_partition; > + p->node = NULL; > + if (dump_file) > + { > + fprintf (dump_file, > + "Concatenating partitions %i and %i, priority %i\n", > + p->src_partition->index, > + p->dest_partition->index, > + (int)p->priority); > + if (dump_file) > + dump_partition (p->src_partition); > + if (dump_file) > + dump_partition (p->dest_partition); > + } > + merge_partitions (partitions, heap, p->src_partition, > p->dest_partition); > + if (dump_file) > + dump_partition (part); > + } > + > + /* We ran out of calls to merge. Try to arrange remaining partitions > + approximately in execution order: static constructors first, followed > + by partition containing function main () > + followed by hot sections of the program. */ > + if (dump_file) > + fprintf (dump_file, "\n\nLooking for static constructors\n"); > + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++) > + if (part && part != first_part > + && DECL_STATIC_CONSTRUCTOR (VEC_index (cgraph_node_ptr, > + part->nodes, 0)->decl)) > + { > + if (dump_file) > + dump_partition (part); > + if (!first_part) > + first_part = part; > + else > + merge_partitions (partitions, heap, first_part, part); > + } > + if (dump_file) > + fprintf (dump_file, "\n\nLooking for main\n"); > + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++) > + if (part && part != first_part > + && MAIN_NAME_P (DECL_NAME (VEC_index (cgraph_node_ptr, > + part->nodes, 0)->decl))) > + { > + if (dump_file) > + dump_partition (part); > + if (!first_part) > + first_part = part; > + else > + merge_partitions (partitions, heap, first_part, part); > + } > + if (dump_file) > + fprintf (dump_file, "\n\nMerging by frequency\n"); > + for (freq = NODE_FREQUENCY_HOT; freq >= NODE_FREQUENCY_UNLIKELY_EXECUTED; > + freq--) > + { > + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++) > + if (part && part != first_part > + && VEC_index (cgraph_node_ptr, part->nodes, 0)->frequency == freq) > + { > + if (dump_file) > + dump_partition (part); > + if (!first_part) > + first_part = part; > + else > + merge_partitions (partitions, heap, first_part, part); > + } > + } > + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++) > + gcc_assert (!part || part == first_part); > + > + fibheap_delete (heap); > + if (!first_part) > + return 0; > + if (dump_file) > + { > + fprintf (dump_file, "\n\nFinal order\n"); > + dump_partition (first_part); > + } > + > + /* Store the resulting order and kill the single remaining partition. */ > + for (i = 0; VEC_iterate (cgraph_node_ptr, first_part->nodes, i, node); > i++) > + node->order = i; > + VEC_free (priority_ptr, heap, first_part->priorities); > + VEC_free (cgraph_node_ptr, heap, first_part->nodes); > + free (first_part); > + return 0; > + } > + > + static bool > + gate_ipa_func_reorder (void) > + { > + return optimize && flag_ipa_reorder && flag_toplevel_reorder; > + } > + > + struct ipa_opt_pass_d pass_ipa_func_reorder = > + { > + { > + IPA_PASS, > + "reorder", /* name */ > + gate_ipa_func_reorder, /* gate */ > + ipa_func_reorder, /* execute */ > + NULL, /* sub */ > + NULL, /* next */ > + 0, /* static_pass_number */ > + TV_CGRAPHOPT, /* tv_id */ > + 0, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0 /* todo_flags_finish */ > + }, > + NULL, /* generate_summary */ > + NULL, /* write_summary */ > + NULL, /* read_summary */ > + NULL, /* write_optimization_summary > */ > + NULL, /* read_optimization_summary > */ > + NULL, /* stmt_fixup */ > + 0, /* TODOs */ > + NULL, /* function_transform */ > + NULL /* variable_transform */ > + }; > Index: Makefile.in > =================================================================== > *** Makefile.in (revision 164689) > --- Makefile.in (working copy) > *************** OBJS-archive = \ > *** 1461,1466 **** > --- 1461,1467 ---- > ipa-type-escape.o \ > ipa-utils.o \ > ipa.o \ > + ipa-reorder.o \ > matrix-reorg.o \ > prefix.o \ > tree-inline.o \ > *************** varpool.o : varpool.c $(CONFIG_H) $(SYST > *** 3012,3017 **** > --- 3013,3020 ---- > $(TREE_FLOW_H) gt-varpool.h > ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) \ > $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) pointer-set.h > + ipa-reorder.o : ipa-reorder.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ > + $(CGRAPH_H) $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) > pointer-set.h > ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ > langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) > $(DIAGNOSTIC_H) \ > $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \ > Index: passes.c > =================================================================== > *** passes.c (revision 164689) > --- passes.c (working copy) > *************** init_optimization_passes (void) > *** 818,823 **** > --- 818,824 ---- > NEXT_PASS (pass_ipa_type_escape); > NEXT_PASS (pass_ipa_pta); > NEXT_PASS (pass_ipa_struct_reorg); > + NEXT_PASS (pass_ipa_func_reorder); > *p = NULL; > > p = &all_lto_gen_passes;