On Wed, Aug 29, 2012 at 10:42 AM, Zhenqiang Chen <zhenqiang.c...@arm.com> wrote: >> -----Original Message----- >> From: Steven Bosscher [mailto:stevenb....@gmail.com] >> Sent: Friday, August 24, 2012 8:17 PM >> To: Zhenqiang Chen >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: Ping: [PATCH] Enable bbro for -Os >> >> On Wed, Aug 22, 2012 at 8:49 AM, Zhenqiang Chen >> <zhenqiang.c...@arm.com> wrote: >> >> The patch is to enable bbro for -Os. When optimizing for size, it >> >> * avoid duplicating block. >> >> * keep its original order if there is no chance to fall through. >> >> * ignore edge frequency and probability. >> >> * handle predecessor first if its index is smaller to break long trace. >> >> You do this by inserting the index as a key. I don't fully understand this >> change. You're assuming that a block with a lower index has a lower pre- >> order number in the CFG's DFS spanning tree, IIUC (i.e. the blocks are >> numbered sequentially)? I'm not sure that's always true. I think you > should >> add an explanation for this heuristic. > > Thank you for the comments. > > cleanup_cfg is called at the end cfg_layout_initialize before > reorder_basic_blocks. cleanup_cfg does lots of optimization on cfg and > renumber the basic blocks. After cleanup_cfg, the blocks are roughly > numbered sequentially.
Well, sequentially in their current "order" which is not in any way flow-controlled. > The heuristic bases on the result of cleanup_cfg. It just wants to keep the > order of cleanup_cfg since logs show we will have code size improvement (by > cleanup_cfg) even if we do not call reorder_basic_blocks. "index as a key" > is a simple way keep the original order. That's true. > Comments are added in the updated patch. > >> >> * only connect Trace n with Trace n + 1 to reduce long jump. >> ... >> >> * bb-reorder.c (connect_better_edge_p): New added. >> >> (find_traces_1_round): When optimizing for size, ignore edge >> >> frequency >> >> and probability, and handle all in one round. >> >> (bb_to_key): Use bb->index as key for size. >> >> (better_edge_p): The smaller bb index is better for size. >> >> (connect_traces): Connect block n with block n + 1; >> >> connect trace m with trace m + 1 if falling through. >> >> (copy_bb_p): Avoid duplicating blocks. >> >> (gate_handle_reorder_blocks): Enable bbro when optimizing for > -Os. >> >> This probably fixes PR54364. > > Try the case in PR54364, the patch does reduce several jmp. > >> > @@ -1169,6 +1272,10 @@ copy_bb_p (const_basic_block bb, int >> code_may_grow) >> > int max_size = uncond_jump_length; >> > rtx insn; >> > >> > + /* Avoid duplicating blocks for size. */ if >> > + (optimize_function_for_size_p (cfun)) >> > + return false; >> > + >> > if (!bb->frequency) >> > return false; >> >> This shouldn't be necessary, due to the CODE_MAY_GROW argument, and >> this change should result in a code size increase because jumps to > conditional >> jumps aren't removed anymore. What did you make this change for, do you >> have a test case where code size increases if you allow copy_bb_p to > return >> true? > > Thanks. It is not necessary. > > Here is the updated ChangeLog. The updated patch is attached. > > ChangeLog > 2012-08-29 Zhenqiang Chen <zhenqiang.c...@arm.com> > > PR middle-end/54364 > * bb-reorder.c (connect_better_edge_p): New added. > (find_traces_1_round): When optimizing for size, ignore edge > frequency > and probability, and handle all in one round. > (bb_to_key): Use bb->index as key for size. > (better_edge_p): The smaller bb index is better for size. > (connect_traces): Connect block n with block n + 1; > connect trace m with trace m + 1 if falling through. > (gate_handle_reorder_blocks): Enable bbro when optimizing for -Os. @@ -530,10 +544,11 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, } /* Edge that cannot be fallthru or improbable or infrequent - successor (i.e. it is unsuitable successor). */ + successor (i.e. it is unsuitable successor). + For size, ignore the frequency and probability. */ if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX) - || prob < branch_th || EDGE_FREQUENCY (e) < exec_th - || e->count < count_th) + || (prob < branch_th || EDGE_FREQUENCY (e) < exec_th + || e->count < count_th) && !for_size) continue; why that change? It seems you do re-orderings that would not be done with -Os even though your goal was to preserve the original ordering. + /* Wait for the predecessors. */ + if ((e == best_edge) && for_size + && (EDGE_COUNT (best_edge->dest->succs) > 1 + || EDGE_COUNT (best_edge->dest->preds) > 1)) + { + best_edge = NULL; + } I don't understand this (well, I'm not very familiar with bb-reorder), doesn't that mean you rather want to push this block to the next round? Overall I think this patch looks like adding hacks into the heuristics to make it sane for -Os instead of doing what the comment suggests: - /* Don't reorder blocks when optimizing for size because extra jump insns may - be created; also barrier may create extra padding. - - More correctly we should have a block reordering mode that tried to - minimize the combined size of all the jumps. This would more or less - automatically remove extra jumps, but would also try to use more short - jumps instead of long jumps. */ - if (!optimize_function_for_speed_p (cfun)) - return false; So, can you categorize the reorderings that you see are profitable? How much gain do you get when you try to minimize the number of jumps (thus, maximize the number of fallthrus?) Richard.