RE: [PATCH] Enable bbro for -Os

Zhenqiang Chen Thu, 06 Sep 2012 02:57:32 -0700

Thank you for the comments.

> -----Original Message-----
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Wednesday, September 05, 2012 9:39 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Enable bbro for -Os
> 
> > Basic block reordering is disabled for -Os from gcc 4.7 since the pass
> > will lead to big code size regression. But benchmarks logs also show
> > there are lots of regression due to poor code layout compared with 4.6.
> >
> > The patch is to enable bbro for -Os. When optimizing for size, it
> > * avoid duplicating block.
> > * keep its original order if there is no chance to fall through.
> > * ignore edge frequency and probability.
> > * handle predecessor first if its index is smaller to break long trace.
> > * only connect Trace n with Trace n + 1 to reduce long jump.
> >
> > Here are the CSiBE code size benchmark results:
> > * For ARM, code size reduces 0.21%.
> > * For MIPS, code size reduces 0.25%.
> > * For PPC, code size reduces 0.33%.
> > * For X86, code size reduces 0.22%.
> 
> Interesting figures.  The patch looks good overall but, since the -Os path
> deviates substantially from the original algorithm, it needs to be clearly
> documented in the comment at the top of the file (before "Reference"),
e.g.
> 
> "The above description is for the full algorithm, which is used when the
> function is optimized for speed.  When the function is optimized for size,
in
> order to <...insert reasons here...>, the algorithm is modified as
follows:
> <...list modifications here...>"


Add the following comments:
+   The above description is for the full algorithm, which is used when the
+   function is optimized for speed.  When the function is optimized for
size,
+   in order to reduce long jump and connect more fall through edges, the
+   algorithm is modified as follows:
+   (1) Break long trace to short ones.  The trace is broken at a block,
which
+   has multi-predecessors/successors during finding traces.  When
connecting
+   traces, only connect Trace n with Trace n + 1.  This change reduces most
+   long jumps compared with the above algorithm.
+   (2) Ignore the edge probability and frequency for fall through edges.
+   (3) Keep its original order when there is no chance to fall through.
bbro
+   bases on the result of cfg_cleanup, which does lots of optimizations on
cfg.
+   So the order is expected to be kept if no fall through.
+
+   To implement the change for code size optimization, block's index is
+   selected as the key and all traces are found in one round.
 
 
> @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th,
> gcov_type count_th,
>         /* Add all non-selected successors to the heaps.  */
>         FOR_EACH_EDGE (e, ei, bb->succs)
>           {
> +           /* Wait for the predecessors.  */
> +           if ((e == best_edge) && for_size
> +               && (EDGE_COUNT (best_edge->dest->succs) > 1
> +                   || EDGE_COUNT (best_edge->dest->preds) > 1))
> +             {
> +               best_edge = NULL;
> +             }
> +
>             if (e == best_edge
>                 || e->dest == EXIT_BLOCK_PTR
>                 || bb_visited_trace (e->dest))
> 
> I don't really understand what this means and why this is done here.  If
you
> don't want to add the best destination in some cases, why not do it just
> before the loop and explicitly state the reason?  And you don't need
> parentheses.

The code segment should be before the loop. Add the following comments for
the code:

+         /* If the best destination has multiple successors or
predecessors,
+            don't allow it to be added when optimizing for size.  This
makes
+            sure predecessors with smaller index handled before the best
+            destination.  It breaks long trace and reduces long jumps.
+
+            Take if-then-else as an example.
+               A
+              / \
+             B   C
+              \ /
+               D
+            If we do not remove the best edge B->D/C->D.  The final order
is
+            A B D ... C.  C is at the end of the program.  If D or
successors
+            of D are complicated, might need long jump for A->C and C->D.
+            Similar issue for order: A C D ... B.
+
+            After removing the best edge, the final result will be
ABCD/ACBD.
+            It does not add jump compared with the previous order. But it
+            reduce the possibility of long jump.  */
+         if (best_edge && for_size
+             && (EDGE_COUNT (best_edge->dest->succs) > 1
+                || EDGE_COUNT (best_edge->dest->preds) > 1))
+           best_edge = NULL;

All other comments are accepted. 

The updated patch is attached. Is it OK?

Thanks!
-Zhenqiang

Enable-bbro-for-size-updated2.patch
Description: Binary data

RE: [PATCH] Enable bbro for -Os

Reply via email to