On Sun, Oct 6, 2019 at 4:38 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > On 9/19/19 2:33 AM, Martin Liška wrote:
> > > Hi.
> > >
> > > Function reordering has been around for quite some time and a naive
> > > implementation was also part of my diploma thesis some time ago.
> > > Currently, the GCC can reorder function based on first execution, which
> > > happens with PGO and LTO of course. Known limitation is that the order
> > > is preserved only partially as various symbols go into different LTRANS 
> > > partitions.
> > >
> > > There has been some research in the area and I would point out the 
> > > Facebook paper
> > > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new 
> > > implementation
> > > in the GCC that does the same (in a proper way). First part of the 
> > > enablement are patches
> > > to ld.bfd and ld.gold that come up with a new section .text.sorted, that 
> > > is always sorted.
> > > There's a snippet from the modified default linker script:
> > Funny, I was doing this via linker scripts circa ~95, in fact that's why
> > we have -ffunction-sections :-)   We started with scripts which
> > post-processed profiling data to create linker scripts for ELF systems.
> >  We had something for HPUX/SOM as well, but I can't remember what
> > mechanism we used, it may have been a gross level sorting using the SOM
> > section sort key mechanism - it only had 128 or 256 keys with a notable
> > amount of them reserved.
> >
> > We had also built a linker with a basic level of interposition circa
> > 1993 and explored various approaches to reordering executables.  I'd
> > just joined the group at the time and was responsible for wiring up
> > stuff on the OS side, but eventually got the "pleasure" of owning the
> > linker server.  A lot of the C3 algorithmic stuff looks similar to what
> > we did.
>
> For reference I am attaching early LTO version from circa 2010 :)
> >
> > Anyway...
> >
> > I don't see anything objectionable in here.  It's noted as an RFC.  Are
> > you interested in pushing this forward for gcc-10?
>
> I think it is plan to get some form of code layout pass into GCC10.  I
> will test Martin's version on Firefox and see if it have any effect
> here. It is generally quite hard to evaluate those changes (and it is
> reason why I did not moved forward with version form 2010 - more
> precisely it kept falling off the shelf for about a decade)
>
> If LLD has support for cgraph annotations and we support LLD, i think we
> should add support for that, too - it will be very useful to compare
> indvidiual implementations.
> I believe there is gold support (off tree?) for that too and something
> similar is also supported by other toolchains like AIX?
>
> One problem with the sections approach is that every section gets
> aligned to largest code alignment inside of the section. Since our
> alignments are (should be) often cache line based we get cca 30 bytes of
> waste for every function that is quite a lot.
>
> This is why we currently have way to order function when outputting them
> and use that with FDO (using Martin's first execution logic). This has
> drwarback of making the functions to flow in that order through late
> optimizations and RTL backend and thus we lose IPA-RA and some
> IP propagation (late pure/const/nothrow discovery).

But you can also fix that by the parallelization GSoC project approach,
decoupling things at RTL expansion IPA-wise and using output order
for the latter (or even do the fragments in GCC itself by refactoring the
output machinery to use function-local "strings", assembling the final
file later).  Refactoring output to handle multiple output "files" at the
same time might also help getting rid of that early-LTO-debug copying
stuff (now that it seems that linker-plugin extensions for partially claiming
files will never happen...)

> So this approach has a drawback, too. It is why i was trying to push
> myself to get gcc to use gas fragments :)
>
> Anyway, all these issues can be sovled incementally - lets see how
> Maritn's patch work on Firefox and if we can get it tested elsewhere and
> start from that.
>
> I will take a look into the details.
>
> Honza
> >
> > jeff
> >
>
> Index: tree-pass.h
> ===================================================================
> *** tree-pass.h (revision 164689)
> --- tree-pass.h (working copy)
> *************** extern struct ipa_opt_pass_d pass_ipa_lt
> *** 467,472 ****
> --- 467,473 ----
>   extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
>   extern struct ipa_opt_pass_d pass_ipa_profile;
>   extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
> + extern struct ipa_opt_pass_d pass_ipa_func_reorder;
>
>   extern struct gimple_opt_pass pass_all_optimizations;
>   extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
> Index: cgraphunit.c
> ===================================================================
> *** cgraphunit.c        (revision 164689)
> --- cgraphunit.c        (working copy)
> *************** cgraph_inline_p (struct cgraph_edge *e,
> *** 1517,1522 ****
> --- 1517,1529 ----
>     return !e->inline_failed;
>   }
>
> + static int
> + node_cmp (const void *pa, const void *pb)
> + {
> +   const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
> +   const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
> +   return b->order - a->order;
> + }
>
>
>   /* Expand all functions that must be output.
> *************** cgraph_expand_all_functions (void)
> *** 1546,1551 ****
> --- 1553,1561 ----
>       if (order[i]->process)
>         order[new_order_pos++] = order[i];
>
> +   if (flag_ipa_reorder)
> +     qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
> +
>     for (i = new_order_pos - 1; i >= 0; i--)
>       {
>         node = order[i];
> Index: tree-ssa-ccp.c
> ===================================================================
> *** tree-ssa-ccp.c      (revision 164689)
> --- tree-ssa-ccp.c      (working copy)
> *************** ccp_fold_stmt (gimple_stmt_iterator *gsi
> *** 2307,2312 ****
> --- 2307,2324 ----
>                 changed = true;
>               }
>           }
> +       if (TREE_CODE (gimple_call_fn (stmt)) == OBJ_TYPE_REF)
> +         {
> +           tree expr = OBJ_TYPE_REF_EXPR (gimple_call_fn (stmt));
> +           expr = valueize_op (expr);
> +           if (TREE_CODE (expr) == ADDR_EXPR
> +               && TREE_CODE (TREE_OPERAND (expr, 0)) == FUNCTION_DECL)
> +            {
> +              gimple_call_set_fndecl (stmt, TREE_OPERAND (expr, 0));
> +              debug_gimple_stmt (stmt);
> +              changed = true;
> +            }
> +         }
>
>         return changed;
>         }
> Index: lto-cgraph.c
> ===================================================================
> *** lto-cgraph.c        (revision 164689)
> --- lto-cgraph.c        (working copy)
> *************** lto_output_node (struct lto_simple_outpu
> *** 466,471 ****
> --- 466,473 ----
>
>     if (tag == LTO_cgraph_analyzed_node)
>       {
> +       lto_output_uleb128_stream (ob->main_stream,
> +                                node->order);
>         lto_output_sleb128_stream (ob->main_stream,
>                                  
> node->local.inline_summary.estimated_self_stack_size);
>         lto_output_sleb128_stream (ob->main_stream,
> *************** input_overwrite_node (struct lto_file_de
> *** 927,932 ****
> --- 929,935 ----
>                       struct cgraph_node *node,
>                       enum LTO_cgraph_tags tag,
>                       struct bitpack_d *bp,
> +                     int order,
>                       unsigned int stack_size,
>                       unsigned int self_time,
>                       unsigned int time_inlining_benefit,
> *************** input_overwrite_node (struct lto_file_de
> *** 935,940 ****
> --- 938,944 ----
>                       enum ld_plugin_symbol_resolution resolution)
>   {
>     node->aux = (void *) tag;
> +   node->order = order;
>     node->local.inline_summary.estimated_self_stack_size = stack_size;
>     node->local.inline_summary.self_time = self_time;
>     node->local.inline_summary.time_inlining_benefit = time_inlining_benefit;
> *************** input_node (struct lto_file_decl_data *f
> *** 1027,1032 ****
> --- 1031,1037 ----
>     unsigned long same_body_count = 0;
>     int clone_ref;
>     enum ld_plugin_symbol_resolution resolution;
> +   int order = 0;
>
>     clone_ref = lto_input_sleb128 (ib);
>
> *************** input_node (struct lto_file_decl_data *f
> *** 1045,1050 ****
> --- 1050,1056 ----
>
>     if (tag == LTO_cgraph_analyzed_node)
>       {
> +       order = lto_input_uleb128 (ib);
>         stack_size = lto_input_sleb128 (ib);
>         self_size = lto_input_sleb128 (ib);
>         size_inlining_benefit = lto_input_sleb128 (ib);
> *************** input_node (struct lto_file_decl_data *f
> *** 1066,1072 ****
>
>     bp = lto_input_bitpack (ib);
>     resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
> !   input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time,
>                         time_inlining_benefit, self_size,
>                         size_inlining_benefit, resolution);
>
> --- 1072,1078 ----
>
>     bp = lto_input_bitpack (ib);
>     resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
> !   input_overwrite_node (file_data, node, tag, &bp, order, stack_size, 
> self_time,
>                         time_inlining_benefit, self_size,
>                         size_inlining_benefit, resolution);
>
> Index: gimple-fold.c
> ===================================================================
> *** gimple-fold.c       (revision 164689)
> --- gimple-fold.c       (working copy)
> *************** gimple_fold_obj_type_ref (tree ref, tree
> *** 1465,1470 ****
> --- 1465,1480 ----
>     tree known_binfo = known_type ? TYPE_BINFO (known_type) : NULL_TREE;
>     tree binfo;
>
> +   if (TREE_CODE (obj) == SSA_NAME)
> +     {
> +       gimple def_stmt = SSA_NAME_DEF_STMT (obj);
> +       if (def_stmt
> +         && gimple_code (def_stmt) == GIMPLE_ASSIGN
> +         && (gimple_assign_single_p (def_stmt)
> +             || (gimple_assign_unary_nop_p (def_stmt))))
> +       obj = gimple_assign_rhs1 (def_stmt);
> +     }
> +
>     if (TREE_CODE (obj) == ADDR_EXPR)
>       obj = TREE_OPERAND (obj, 0);
>
> *************** fold_gimple_call (gimple_stmt_iterator *
> *** 1513,1520 ****
>            copying EH region info to the new node.  Easier to just do it
>            here where we can just smash the call operand.  */
>         callee = gimple_call_fn (stmt);
> !       if (TREE_CODE (callee) == OBJ_TYPE_REF
> !           && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR)
>           {
>             tree t;
>
> --- 1523,1529 ----
>            copying EH region info to the new node.  Easier to just do it
>            here where we can just smash the call operand.  */
>         callee = gimple_call_fn (stmt);
> !       if (TREE_CODE (callee) == OBJ_TYPE_REF)
>           {
>             tree t;
>
> Index: common.opt
> ===================================================================
> *** common.opt  (revision 164689)
> --- common.opt  (working copy)
> *************** fdse
> *** 1107,1112 ****
> --- 1107,1116 ----
>   Common Var(flag_dse) Init(1) Optimization
>   Use the RTL dead store elimination pass
>
> + freorder-functions
> + Common Report Var(flag_ipa_reorder) Init(1) Optimization
> + Reroder functions for better locality at execution time
> +
>   freschedule-modulo-scheduled-loops
>   Common Report Var(flag_resched_modulo_sched) Optimization
>   Enable/Disable the traditional scheduling in loops that already passed 
> modulo scheduling
> Index: ipa-reorder.c
> ===================================================================
> *** ipa-reorder.c       (revision 0)
> --- ipa-reorder.c       (revision 0)
> ***************
> *** 0 ****
> --- 1,567 ----
> + /* IPA function reordering pass.
> +    Copyright (C) 2010 Free Software Foundation, Inc.
> +    Contributed by Jan Hubicka  <j...@suse.cz>
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify it under
> + the terms of the GNU General Public License as published by the Free
> + Software Foundation; either version 3, or (at your option) any later
> + version.
> +
> + GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> + WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> + for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with GCC; see the file COPYING3.  If not see
> + <http://www.gnu.org/licenses/>.  */
> +
> + #include "config.h"
> + #include "system.h"
> + #include "coretypes.h"
> + #include "tm.h"
> + #include "cgraph.h"
> + #include "tree-pass.h"
> + #include "timevar.h"
> + #include "gimple.h"
> + #include "ggc.h"
> + #include "flags.h"
> + #include "pointer-set.h"
> + #include "target.h"
> + #include "tree-iterator.h"
> + #include "fibheap.h"
> +
> + struct priority
> + {
> +   gcov_type priority;
> +   struct partition *src_partition;
> +   struct partition *dest_partition;
> +   fibnode_t node;
> + };
> +
> + typedef struct priority *priority_ptr;
> +
> + DEF_VEC_P(priority_ptr);
> + DEF_VEC_ALLOC_P(priority_ptr,heap);
> +
> + struct partition
> + {
> +   int index;
> +   VEC(cgraph_node_ptr, heap) *nodes;
> +   VEC(priority_ptr, heap) *priorities;
> +   VEC(priority_ptr, heap) *in_priorities;
> + };
> +
> + typedef struct partition *partition_ptr;
> +
> + DEF_VEC_P(partition_ptr);
> + DEF_VEC_ALLOC_P(partition_ptr,heap);
> +
> + static bool
> + cgraph_node_will_be_output_p (struct cgraph_node *node)
> + {
> +   return (node->analyzed && !node->global.inlined_to);
> + }
> +
> + static void
> + account_callees (struct partition *part,
> +                VEC (partition_ptr, heap) *partitions,
> +                struct cgraph_node *node)
> + {
> +   struct cgraph_edge *edge;
> +   struct priority *p;
> +   int i;
> +   struct ipa_ref *ref;
> +
> +   for (edge = node->callees; edge; edge = edge->next_callee)
> +     if (edge->inline_failed)
> +       {
> +       if (!edge->callee->aux)
> +         {
> +           struct partition *dest = VEC_index (partition_ptr, partitions,
> +                                               edge->callee->uid);
> +           if (dest && dest != part)
> +             {
> +               p = XCNEW (struct priority);
> +               edge->callee->aux = p;
> +               p->src_partition = part;
> +               p->dest_partition = dest;
> +               VEC_safe_push (priority_ptr, heap, part->priorities, p);
> +               VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
> +             }
> +           else
> +             continue;
> +         }
> +       else
> +         p = (struct priority *)edge->callee->aux;
> +       if (edge->count)
> +         p->priority += edge->count + 1;
> +       else
> +         {
> +           int mul = 1;
> +           switch (edge->caller->frequency)
> +             {
> +             case NODE_FREQUENCY_UNLIKELY_EXECUTED: mul = 0; break;
> +             case NODE_FREQUENCY_EXECUTED_ONCE: mul = 1; break;
> +             case NODE_FREQUENCY_NORMAL: mul = 10; break;
> +             case NODE_FREQUENCY_HOT: mul = 1000; break;
> +             }
> +           p->priority += edge->frequency * mul + 1;
> +         }
> +       }
> +     else
> +       account_callees (part, partitions, edge->callee);
> +
> +   /* Account taking address of a function as possible call with a low 
> priority.  */
> +   for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
> +     if (ref->refered_type == IPA_REF_CGRAPH)
> +       {
> +       struct cgraph_node *node2 = ipa_ref_node (ref);
> +       if (!node2->aux)
> +         {
> +           struct partition *dest = VEC_index (partition_ptr, partitions,
> +                                               node2->uid);
> +           if (dest && dest != part)
> +             {
> +               p = XCNEW (struct priority);
> +               node2->aux = p;
> +               p->src_partition = part;
> +               p->dest_partition = dest;
> +               VEC_safe_push (priority_ptr, heap, part->priorities, p);
> +               VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
> +             }
> +           else
> +             continue;
> +         }
> +       else
> +         p = (struct priority *)node2->aux;
> +       p->priority++;
> +       }
> + }
> +
> + static void
> + clear_callees (struct cgraph_node *node)
> + {
> +   struct cgraph_edge *edge;
> +   int i;
> +   struct ipa_ref *ref;
> +
> +   for (edge = node->callees; edge; edge = edge->next_callee)
> +     if (edge->inline_failed)
> +       edge->callee->aux = NULL;
> +     else
> +       clear_callees (edge->callee);
> +   for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
> +     if (ref->refered_type == IPA_REF_CGRAPH)
> +       ipa_ref_node (ref)->aux = NULL;
> + }
> +
> + /* Compare two priority vector entries via indexes of destination
> +    partitions.  */
> +
> + static int
> + priority_cmp (const void *pa, const void *pb)
> + {
> +   const struct priority *a = *(const struct priority * const *) pa;
> +   const struct priority *b = *(const struct priority * const *) pb;
> +   /* Priorities pointing to dead partitions are removed lazily,
> +      so just avoid referencing dead memory.  */
> +   if (!a->priority)
> +     return ! b->priority ? 0 : -1;
> +   if (!b->priority)
> +     return 1;
> +   return a->dest_partition->index - b->dest_partition->index;
> + }
> +
> + /* Compare two priority vector entries via indexes of destination
> +    partitions.  */
> +
> + static int
> + in_priority_cmp (const void *pa, const void *pb)
> + {
> +   const struct priority *a = *(const struct priority * const *) pa;
> +   const struct priority *b = *(const struct priority * const *) pb;
> +   /* Priorities pointing to dead partitions are removed lazily,
> +      so just avoid referencing dead memory.  */
> +   if (!a->priority)
> +     return ! b->priority ? 0 : -1;
> +   if (!b->priority)
> +     return 1;
> +   return a->src_partition->index - b->src_partition->index;
> + }
> +
> + static void
> + delete_priority (fibheap_t heap, struct priority *p)
> + {
> +   p->priority = 0;
> +   p->src_partition = p->dest_partition = NULL;
> +   if (p->node)
> +     fibheap_delete_node (heap, p->node);
> +   p->node = NULL;
> + }
> +
> + static void
> + merge_partitions (VEC (partition_ptr, heap) *partitions,
> +                 fibheap_t heap,
> +                 struct partition *part1,
> +                 struct partition *part2)
> + {
> +   VEC (priority_ptr, heap) *priorities = NULL;
> +   VEC (priority_ptr, heap) *in_priorities = NULL;
> +   unsigned int i1 = 0, i2 = 0;
> +   struct cgraph_node *node;
> +   int i;
> +
> +   /* We keep priority lists ordered by indexes of destination partitions
> +      to allow effective merging.  */
> +   qsort (VEC_address (priority_ptr, part1->priorities),
> +        VEC_length (priority_ptr, part1->priorities),
> +        sizeof (priority_ptr),
> +        priority_cmp);
> +   qsort (VEC_address (priority_ptr, part2->priorities),
> +        VEC_length (priority_ptr, part2->priorities),
> +        sizeof (priority_ptr),
> +        priority_cmp);
> +
> +   /* Merge priority lists, prune out references of partition to itself.
> +      Assume priority lists are ordered by indexes of destination partitions
> +      and keep them so.  */
> +   while (1)
> +     {
> +       struct priority *p1, *p2;
> +
> +       while (i1 < VEC_length (priority_ptr, part1->priorities)
> +            && !VEC_index (priority_ptr, part1->priorities, i1)->priority)
> +       i1++;
> +       while (i2 < VEC_length (priority_ptr, part2->priorities)
> +            && !VEC_index (priority_ptr, part2->priorities, i2)->priority)
> +       i2++;
> +
> +       if (i1 == VEC_length (priority_ptr, part1->priorities)
> +         && i2 == VEC_length (priority_ptr, part2->priorities))
> +       break;
> +
> +       if (i1 < VEC_length (priority_ptr, part1->priorities)
> +         && (i2 == VEC_length (priority_ptr, part2->priorities)
> +             || (VEC_index (priority_ptr, part1->priorities,
> +                            i1)->dest_partition->index
> +                 < VEC_index (priority_ptr, part2->priorities,
> +                              i2)->dest_partition->index)))
> +       {
> +         p1 = VEC_index (priority_ptr, part1->priorities, i1);
> +         if (p1->dest_partition != part2)
> +           VEC_safe_push (priority_ptr, heap, priorities, p1);
> +         else
> +           delete_priority (heap, p1);
> +         i1++;
> +       }
> +       else if (i2 < VEC_length (priority_ptr, part2->priorities)
> +              && (i1 == VEC_length (priority_ptr, part1->priorities)
> +                  || (VEC_index (priority_ptr, part1->priorities,
> +                                 i1)->dest_partition->index
> +                      > VEC_index (priority_ptr, part2->priorities,
> +                                   i2)->dest_partition->index)))
> +       {
> +         p2 = VEC_index (priority_ptr, part2->priorities, i2);
> +         if (p2->dest_partition != part1)
> +           {
> +             VEC_safe_push (priority_ptr, heap, priorities, p2);
> +             p2->src_partition = part1;
> +           }
> +         else
> +           delete_priority (heap, p2);
> +         i2++;
> +       }
> +       else
> +       {
> +         p1 = VEC_index (priority_ptr, part1->priorities, i1);
> +         p2 = VEC_index (priority_ptr, part2->priorities, i2);
> +         p1->priority += p2->priority;
> +         fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
> +         delete_priority (heap, p2);
> +         VEC_safe_push (priority_ptr, heap, priorities, p1);
> +         i1++;
> +         i2++;
> +       }
> +     }
> +   VEC_free (priority_ptr, heap, part1->priorities);
> +   part1->priorities = priorities;
> +
> +   qsort (VEC_address (priority_ptr, part1->in_priorities),
> +        VEC_length (priority_ptr, part1->in_priorities),
> +        sizeof (priority_ptr),
> +        in_priority_cmp);
> +   qsort (VEC_address (priority_ptr, part2->in_priorities),
> +        VEC_length (priority_ptr, part2->in_priorities),
> +        sizeof (priority_ptr),
> +        in_priority_cmp);
> +   i1 = 0;
> +   i2 = 0;
> +   while (1)
> +     {
> +       struct priority *p1, *p2;
> +       while (i1 < VEC_length (priority_ptr, part1->in_priorities)
> +            && !VEC_index (priority_ptr, part1->in_priorities, i1)->priority)
> +       i1++;
> +       while (i2 < VEC_length (priority_ptr, part2->in_priorities)
> +            && !VEC_index (priority_ptr, part2->in_priorities, i2)->priority)
> +       i2++;
> +
> +       if (i1 == VEC_length (priority_ptr, part1->in_priorities)
> +         && i2 == VEC_length (priority_ptr, part2->in_priorities))
> +       break;
> +
> +       if (i1 < VEC_length (priority_ptr, part1->in_priorities)
> +         && (i2 == VEC_length (priority_ptr, part2->in_priorities)
> +             || (VEC_index (priority_ptr, part1->in_priorities,
> +                            i1)->src_partition->index
> +                 < VEC_index (priority_ptr, part2->in_priorities,
> +                              i2)->src_partition->index)))
> +       {
> +         p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
> +         if (p1->src_partition != part2)
> +           VEC_safe_push (priority_ptr, heap, in_priorities, p1);
> +         else
> +           delete_priority (heap, p1);
> +         i1++;
> +       }
> +       else if (i2 < VEC_length (priority_ptr, part2->in_priorities)
> +              && (i1 == VEC_length (priority_ptr, part1->in_priorities)
> +                  || (VEC_index (priority_ptr, part1->in_priorities,
> +                                 i1)->src_partition->index
> +                      > VEC_index (priority_ptr, part2->in_priorities,
> +                                   i2)->src_partition->index)))
> +       {
> +         p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
> +         if (p2->src_partition != part1)
> +           {
> +             VEC_safe_push (priority_ptr, heap, in_priorities, p2);
> +             p2->dest_partition = part1;
> +           }
> +         else
> +           delete_priority (heap, p2);
> +         i2++;
> +       }
> +       else
> +       {
> +         p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
> +         p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
> +         p1->priority += p2->priority;
> +         fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
> +         delete_priority (heap, p2);
> +         VEC_safe_push (priority_ptr, heap, in_priorities, p1);
> +         i1++;
> +         i2++;
> +       }
> +     }
> +   VEC_free (priority_ptr, heap, part1->in_priorities);
> +   part1->in_priorities = in_priorities;
> +
> +   for (i = 0; VEC_iterate (cgraph_node_ptr, part2->nodes, i, node); i++)
> +     VEC_safe_push (cgraph_node_ptr, heap, part1->nodes, node);
> +
> +   VEC_replace (partition_ptr, partitions, part2->index, NULL);
> +   VEC_free (priority_ptr, heap, part2->priorities);
> +   VEC_free (priority_ptr, heap, part2->in_priorities);
> +   VEC_free (cgraph_node_ptr, heap, part2->nodes);
> +   free (part2);
> + }
> +
> + static void
> + dump_partition (struct partition *part)
> + {
> +   int i;
> +   struct cgraph_node *node;
> +   struct priority *p;
> +   fprintf (dump_file, "  Partition %i:", part->index);
> +   for (i = 0; VEC_iterate (cgraph_node_ptr, part->nodes, i, node); i++)
> +     fprintf (dump_file, "  %s/%i", cgraph_node_name (node), node->uid);
> +   fprintf (dump_file, "\n    priorities:");
> +   for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
> +     if (p->priority)
> +       {
> +       gcc_assert (p->src_partition == part);
> +       gcc_assert (p->dest_partition != part);
> +       fprintf (dump_file, "  %i:%i", p->dest_partition->index,
> +                (int)p->priority);
> +       }
> +   fprintf (dump_file, "\n");
> + }
> +
> + static unsigned int
> + ipa_func_reorder (void)
> + {
> +   struct cgraph_node *node;
> +   VEC (partition_ptr, heap) *partitions = VEC_alloc (partition_ptr, heap, 
> cgraph_max_uid);
> +   int i;
> +   struct partition *part, *first_part = NULL;
> +   int freq;
> +   fibheap_t heap;
> +
> +   if (!cgraph_max_uid)
> +     return 0;
> +
> +   heap = fibheap_new ();
> +   VEC_safe_grow_cleared (partition_ptr, heap, partitions, cgraph_max_uid);
> +
> +   for (node = cgraph_nodes; node; node = node->next)
> +     if (cgraph_node_will_be_output_p (node))
> +       {
> +       part = XCNEW (struct partition);
> +       part->index = node->uid;
> +       VEC_safe_push (cgraph_node_ptr, heap, part->nodes, node);
> +       VEC_replace (partition_ptr, partitions, node->uid, part);
> +       }
> +
> +   if (dump_file)
> +     fprintf (dump_file, "\n\nCreating partitions\n");
> +   for (node = cgraph_nodes; node; node = node->next)
> +     if (cgraph_node_will_be_output_p (node))
> +       {
> +       part = VEC_index (partition_ptr, partitions, node->uid);
> +       account_callees (part, partitions, node);
> +       clear_callees (node);
> +       if (dump_file)
> +         dump_partition (part);
> +       }
> +   for (node = cgraph_nodes; node; node = node->next)
> +     if (cgraph_node_will_be_output_p (node))
> +       {
> +       struct priority *p;
> +       part = VEC_index (partition_ptr, partitions, node->uid);
> +
> +       for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
> +         p->node = fibheap_insert (heap, INT_MAX - p->priority, p);
> +       if (dump_file)
> +         dump_partition (part);
> +       }
> +
> +   if (dump_file)
> +     fprintf (dump_file, "\n\nMerging by priorities\n");
> +   while (!fibheap_empty (heap))
> +     {
> +       struct priority *p = (struct priority *)fibheap_extract_min (heap);
> +       struct partition *part = p->src_partition;
> +       p->node = NULL;
> +       if (dump_file)
> +       {
> +         fprintf (dump_file,
> +                  "Concatenating partitions %i and %i, priority %i\n",
> +                  p->src_partition->index,
> +                  p->dest_partition->index,
> +                  (int)p->priority);
> +         if (dump_file)
> +           dump_partition (p->src_partition);
> +         if (dump_file)
> +           dump_partition (p->dest_partition);
> +       }
> +       merge_partitions (partitions, heap, p->src_partition, 
> p->dest_partition);
> +       if (dump_file)
> +       dump_partition (part);
> +     }
> +
> +   /* We ran out of calls to merge.  Try to arrange remaining partitions
> +      approximately in execution order: static constructors first, followed
> +      by partition containing function main ()
> +      followed by hot sections of the program.  */
> +   if (dump_file)
> +     fprintf (dump_file, "\n\nLooking for static constructors\n");
> +   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> +     if (part && part != first_part
> +       && DECL_STATIC_CONSTRUCTOR (VEC_index (cgraph_node_ptr,
> +                                              part->nodes, 0)->decl))
> +       {
> +       if (dump_file)
> +         dump_partition (part);
> +       if (!first_part)
> +        first_part = part;
> +       else
> +        merge_partitions (partitions, heap, first_part, part);
> +       }
> +   if (dump_file)
> +     fprintf (dump_file, "\n\nLooking for main\n");
> +   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> +     if (part && part != first_part
> +       && MAIN_NAME_P (DECL_NAME (VEC_index (cgraph_node_ptr,
> +                                             part->nodes, 0)->decl)))
> +       {
> +       if (dump_file)
> +         dump_partition (part);
> +       if (!first_part)
> +        first_part = part;
> +       else
> +        merge_partitions (partitions, heap, first_part, part);
> +       }
> +   if (dump_file)
> +     fprintf (dump_file, "\n\nMerging by frequency\n");
> +   for (freq = NODE_FREQUENCY_HOT; freq >= NODE_FREQUENCY_UNLIKELY_EXECUTED;
> +        freq--)
> +     {
> +       for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> +       if (part && part != first_part
> +           && VEC_index (cgraph_node_ptr, part->nodes, 0)->frequency == freq)
> +         {
> +           if (dump_file)
> +             dump_partition (part);
> +           if (!first_part)
> +            first_part = part;
> +           else
> +            merge_partitions (partitions, heap, first_part, part);
> +         }
> +     }
> +   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> +     gcc_assert (!part || part == first_part);
> +
> +   fibheap_delete (heap);
> +   if (!first_part)
> +     return 0;
> +   if (dump_file)
> +     {
> +       fprintf (dump_file, "\n\nFinal order\n");
> +       dump_partition (first_part);
> +     }
> +
> +   /* Store the resulting order and kill the single remaining partition.  */
> +   for (i = 0; VEC_iterate (cgraph_node_ptr, first_part->nodes, i, node); 
> i++)
> +     node->order = i;
> +   VEC_free (priority_ptr, heap, first_part->priorities);
> +   VEC_free (cgraph_node_ptr, heap, first_part->nodes);
> +   free (first_part);
> +   return 0;
> + }
> +
> + static bool
> + gate_ipa_func_reorder (void)
> + {
> +   return optimize && flag_ipa_reorder && flag_toplevel_reorder;
> + }
> +
> + struct ipa_opt_pass_d pass_ipa_func_reorder =
> + {
> +  {
> +   IPA_PASS,
> +   "reorder",                          /* name */
> +   gate_ipa_func_reorder,              /* gate */
> +   ipa_func_reorder,                   /* execute */
> +   NULL,                                       /* sub */
> +   NULL,                                       /* next */
> +   0,                                  /* static_pass_number */
> +   TV_CGRAPHOPT,                               /* tv_id */
> +   0,                                  /* properties_required */
> +   0,                                  /* properties_provided */
> +   0,                                  /* properties_destroyed */
> +   0,                                  /* todo_flags_start */
> +   0                                     /* todo_flags_finish */
> +  },
> +  NULL,                                        /* generate_summary */
> +  NULL,                                        /* write_summary */
> +  NULL,                                        /* read_summary */
> +  NULL,                                        /* write_optimization_summary 
> */
> +  NULL,                                        /* read_optimization_summary 
> */
> +  NULL,                                        /* stmt_fixup */
> +  0,                                   /* TODOs */
> +  NULL,                                        /* function_transform */
> +  NULL                                 /* variable_transform */
> + };
> Index: Makefile.in
> ===================================================================
> *** Makefile.in (revision 164689)
> --- Makefile.in (working copy)
> *************** OBJS-archive = \
> *** 1461,1466 ****
> --- 1461,1467 ----
>         ipa-type-escape.o \
>         ipa-utils.o \
>         ipa.o \
> +       ipa-reorder.o \
>         matrix-reorg.o \
>         prefix.o \
>         tree-inline.o \
> *************** varpool.o : varpool.c $(CONFIG_H) $(SYST
> *** 3012,3017 ****
> --- 3013,3020 ----
>      $(TREE_FLOW_H) gt-varpool.h
>   ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) \
>      $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) pointer-set.h
> + ipa-reorder.o : ipa-reorder.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
> +    $(CGRAPH_H) $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) 
> pointer-set.h
>   ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>      langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) 
> $(DIAGNOSTIC_H) \
>      $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
> Index: passes.c
> ===================================================================
> *** passes.c    (revision 164689)
> --- passes.c    (working copy)
> *************** init_optimization_passes (void)
> *** 818,823 ****
> --- 818,824 ----
>     NEXT_PASS (pass_ipa_type_escape);
>     NEXT_PASS (pass_ipa_pta);
>     NEXT_PASS (pass_ipa_struct_reorg);
> +   NEXT_PASS (pass_ipa_func_reorder);
>     *p = NULL;
>
>     p = &all_lto_gen_passes;

Reply via email to