Thank you for the comments. > -----Original Message----- > From: Eric Botcazou [mailto:ebotca...@adacore.com] > Sent: Wednesday, September 05, 2012 9:39 PM > To: Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Enable bbro for -Os > > > Basic block reordering is disabled for -Os from gcc 4.7 since the pass > > will lead to big code size regression. But benchmarks logs also show > > there are lots of regression due to poor code layout compared with 4.6. > > > > The patch is to enable bbro for -Os. When optimizing for size, it > > * avoid duplicating block. > > * keep its original order if there is no chance to fall through. > > * ignore edge frequency and probability. > > * handle predecessor first if its index is smaller to break long trace. > > * only connect Trace n with Trace n + 1 to reduce long jump. > > > > Here are the CSiBE code size benchmark results: > > * For ARM, code size reduces 0.21%. > > * For MIPS, code size reduces 0.25%. > > * For PPC, code size reduces 0.33%. > > * For X86, code size reduces 0.22%. > > Interesting figures. The patch looks good overall but, since the -Os path > deviates substantially from the original algorithm, it needs to be clearly > documented in the comment at the top of the file (before "Reference"), e.g. > > "The above description is for the full algorithm, which is used when the > function is optimized for speed. When the function is optimized for size, in > order to <...insert reasons here...>, the algorithm is modified as follows: > <...list modifications here...>"
Add the following comments: + The above description is for the full algorithm, which is used when the + function is optimized for speed. When the function is optimized for size, + in order to reduce long jump and connect more fall through edges, the + algorithm is modified as follows: + (1) Break long trace to short ones. The trace is broken at a block, which + has multi-predecessors/successors during finding traces. When connecting + traces, only connect Trace n with Trace n + 1. This change reduces most + long jumps compared with the above algorithm. + (2) Ignore the edge probability and frequency for fall through edges. + (3) Keep its original order when there is no chance to fall through. bbro + bases on the result of cfg_cleanup, which does lots of optimizations on cfg. + So the order is expected to be kept if no fall through. + + To implement the change for code size optimization, block's index is + selected as the key and all traces are found in one round. > @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th, > gcov_type count_th, > /* Add all non-selected successors to the heaps. */ > FOR_EACH_EDGE (e, ei, bb->succs) > { > + /* Wait for the predecessors. */ > + if ((e == best_edge) && for_size > + && (EDGE_COUNT (best_edge->dest->succs) > 1 > + || EDGE_COUNT (best_edge->dest->preds) > 1)) > + { > + best_edge = NULL; > + } > + > if (e == best_edge > || e->dest == EXIT_BLOCK_PTR > || bb_visited_trace (e->dest)) > > I don't really understand what this means and why this is done here. If you > don't want to add the best destination in some cases, why not do it just > before the loop and explicitly state the reason? And you don't need > parentheses. The code segment should be before the loop. Add the following comments for the code: + /* If the best destination has multiple successors or predecessors, + don't allow it to be added when optimizing for size. This makes + sure predecessors with smaller index handled before the best + destination. It breaks long trace and reduces long jumps. + + Take if-then-else as an example. + A + / \ + B C + \ / + D + If we do not remove the best edge B->D/C->D. The final order is + A B D ... C. C is at the end of the program. If D or successors + of D are complicated, might need long jump for A->C and C->D. + Similar issue for order: A C D ... B. + + After removing the best edge, the final result will be ABCD/ACBD. + It does not add jump compared with the previous order. But it + reduce the possibility of long jump. */ + if (best_edge && for_size + && (EDGE_COUNT (best_edge->dest->succs) > 1 + || EDGE_COUNT (best_edge->dest->preds) > 1)) + best_edge = NULL; All other comments are accepted. The updated patch is attached. Is it OK? Thanks! -Zhenqiang
Enable-bbro-for-size-updated2.patch
Description: Binary data