Bernd Schmidt <ber...@codesourcery.com> writes: > This adds the actual optimization, and reworks the JUMP_LABEL handling > for return blocks. See the introduction mail or the new comment ahead of > thread_prologue_and_epilogue_insns for more notes.
It seems a shame to have both (return) and (simple_return). You said that we need the distinction in order to cope with targets like ARM, whose (return) instruction actually performs some of the epilogue too. It feels like the load of the saved registers should really be expressed in rtl, in parallel with the return. I realise that'd prevent conditional returns though. Maybe there's no elegant way out... With the hidden loads, it seems like we'll have a situation in which the values of call-saved registers will appear to be different for different "real" incoming edges to the exit block. Is JUMP_LABEL ever null after this change? (In fully-complete rtl sequences, I mean.) It looked like some of the null checks in the patch might not be necessary any more. JUMP_LABEL also seems somewhat misnamed after this change; maybe JUMP_TARGET would be better? I'm the last person who should be recommending names though. I know it's a pain, but it'd really help if you could split the "JUMP_LABEL == a return rtx" stuff out. I think it'd also be worth splitting the RETURN_ADDR_REGNUM bit out into a separate patch, and handling other things in a more generic way. E.g. the default INCOMING_RETURN_ADDR_RTX could then be: #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM) and df.c:df_get_exit_block_use_set should include RETURN_ADDR_REGNUM when epilogue_completed. It'd be nice to handle cases in which all references to the stack pointer are to the incoming arguments. Maybe mention the fact that we don't as another source of conservatism? It'd also be nice to get rid of all these big blocks of code that are conditional on preprocessor macros, but I realise you're just following existing practice in the surrounding code, so again it can be left to a future cleanup. > @@ -1280,7 +1297,7 @@ force_nonfallthru_and_redirect (edge e, > basic_block > force_nonfallthru (edge e) > { > - return force_nonfallthru_and_redirect (e, e->dest); > + return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX); > } Maybe assert here that e->dest isn't the exit block? I realise it will be caught by the: gcc_assert (jump_label == simple_return_rtx); check, but an assert here would make it more obvious what had gone wrong. > - if (GET_CODE (x) == RETURN) > + if (GET_CODE (x) == RETURN || GET_CODE (x) == SIMPLE_RETURN) ANY_RETURN_P (x). A few other cases. > @@ -5654,6 +5658,7 @@ init_emit_regs (void) > /* Assign register numbers to the globally defined register rtx. */ > pc_rtx = gen_rtx_fmt_ (PC, VOIDmode); > ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode); > + simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode); > cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode); > stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM); > frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM); It'd be nice to s/ret_rtx/return_rtx/ for consistency, but that can happen anytime. > +/* Return true if INSN requires the stack frame to be set up. */ > +static bool > +requires_stack_frame_p (rtx insn) > +{ > + HARD_REG_SET hardregs; > + unsigned regno; > + > + if (!INSN_P (insn) || DEBUG_INSN_P (insn)) > + return false; > + if (CALL_P (insn)) > + return !SIBLING_CALL_P (insn); > + if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL)) > + return true; > + CLEAR_HARD_REG_SET (hardregs); > + note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs); > + AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set); > + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > + if (TEST_HARD_REG_BIT (hardregs, regno) > + && df_regs_ever_live_p (regno)) > + return true; This can be done as a follow-up, but it looks like df should be using a HARD_REG_SET here, and that we should be able to get at it directly. > + FOR_EACH_EDGE (e, ei, bb->preds) > + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) > + { > + VEC_quick_push (basic_block, vec, e->src); > + bitmap_set_bit (&bb_on_list, e->src->index); > + } && !bitmap_bit_p (&bb_on_list, e->src->index) ? > + } > + while (!VEC_empty (basic_block, vec)) > + { > + basic_block tmp_bb = VEC_pop (basic_block, vec); > + edge e; > + edge_iterator ei; > + bool all_set = true; > + > + bitmap_clear_bit (&bb_on_list, tmp_bb->index); > + FOR_EACH_EDGE (e, ei, tmp_bb->succs) > + { > + if (!bitmap_bit_p (&bb_antic_flags, e->dest->index)) > + { > + all_set = false; > + break; > + } > + } > + if (all_set) > + { > + bitmap_set_bit (&bb_antic_flags, tmp_bb->index); > + FOR_EACH_EDGE (e, ei, tmp_bb->preds) > + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) > + { > + VEC_quick_push (basic_block, vec, e->src); > + bitmap_set_bit (&bb_on_list, e->src->index); > + } same here. > + } > + } > + /* Find exactly one edge that leads to a block in ANTIC from > + a block that isn't. */ > + if (!bitmap_bit_p (&bb_antic_flags, entry_edge->dest->index)) > + FOR_EACH_BB (bb) > + { > + if (!bitmap_bit_p (&bb_antic_flags, bb->index)) > + continue; > + FOR_EACH_EDGE (e, ei, bb->preds) > + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) > + { > + if (entry_edge != orig_entry_edge) > + { > + entry_edge = orig_entry_edge; > + goto fail_shrinkwrap; > + } > + entry_edge = e; > + } > + } AIUI, this prevents the optimisation for things like if (a) { switch (b) { case 1: ...stuff that requires a frame... break; case 2: ...stuff that requires a frame... break; default: ...stuff that doesn't require a frame... break; } } The switch won't be in ANTIC, but it will have two successors that are. Is that right? Would it work to do something like: FOR_EACH_BB (bb) { rtx insn; FOR_BB_INSNS (bb, insn) if (requires_stack_frame_p (insn)) { bitmap_set_bit (&bb_flags, bb->index); break; } } if (bitmap_empty_p (bb_flags)) ...no frame needed... else { calculate_dominance_info (CDI_DOMINATORS); bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bb_flags); if (bb == ENTRY_BLOCK_PTR) ...bleh... else ...insert prologue at the beginning of bb... } ? Or (for a different trade-off) just use nearest_common_dominator directly, and avoid the bitmap. > @@ -5515,25 +5841,38 @@ thread_prologue_and_epilogue_insns (void > set_insn_locators (seq, epilogue_locator); > > seq = get_insns (); > + returnjump = get_last_insn (); > end_sequence (); > > - insert_insn_on_edge (seq, e); > + insert_insn_on_edge (seq, exit_fallthru_edge); > inserted = true; > + if (JUMP_P (returnjump)) > + { > + rtx pat = PATTERN (returnjump); > + if (GET_CODE (pat) == PARALLEL) > + pat = XVECEXP (pat, 0, 0); > + if (ANY_RETURN_P (pat)) > + JUMP_LABEL (returnjump) = pat; > + else > + JUMP_LABEL (returnjump) = ret_rtx; > + } > + else > + returnjump = NULL_RTX; Does the "JUMP_LABEL (returnjump) = ret_rtx;" handle targets that use things like (set (pc) (reg RA)) as their return? Probably worth adding a comment if so. I didn't review much after this, because it was hard to sort the simple_return stuff out from the "JUMP_LABEL can be a return rtx" change. Richard