On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote: > On 09/23/2016 02:21 AM, Segher Boessenkool wrote: > >--- a/gcc/function.c > >+++ b/gcc/function.c > >@@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void) > > edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > edge orig_entry_edge = entry_edge; > > > >- rtx_insn *split_prologue_seq = make_split_prologue_seq (); > > rtx_insn *prologue_seq = make_prologue_seq (); > >- rtx_insn *epilogue_seq = make_epilogue_seq (); > > > > /* Try to perform a kind of shrink-wrapping, making sure the > > prologue/epilogue is emitted only around those parts of the > >@@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void) > > > > try_shrink_wrapping (&entry_edge, prologue_seq); > > > >+ try_shrink_wrapping_separate (entry_edge->dest); > >+ > >+ if (crtl->shrink_wrapped_separate) > >+ prologue_seq = make_prologue_seq (); > Note this implies that make_prologue_seq (and consequently the target > specific bits for prologue generation) can be safely called more than > once. That's probably OK, but might be worth a comment -- your decision > whether or not to add the comment.
It is only called twice if separately shrink-wrapping (and that did anything), so it won't change current behaviour. I'll add a comment. > >+static void > >+place_prologue_for_one_component (unsigned int which, basic_block head) > >+{ > >+ /* The block we are currently dealing with. */ > >+ basic_block bb = head; > >+ /* Is this the first time we visit this block, i.e. have we just gone > >+ down the tree. */ > >+ bool first_visit = true; > >+ > >+ /* Walk the dominator tree, visit one block per iteration of this loop. > >+ Each basic block is visited twice: once before visiting any children > >+ of the block, and once after visiting all of them (leaf nodes are > >+ visited only once). As an optimization, we do not visit subtrees > >+ that can no longer influence the prologue placement. */ > >+ for (;;) > Is there some reason you wrote this as a loop rather than recursion? > IMHO it makes this function (and spread_components) more difficult to > reason about than it needs to be. It would recurse a few thousand frames deep on not terribly big testcases (i.e. I've seen it happen). This uses a lot of stack space on some ABIs, and is very slow as well (this is the slowest function here by far). Unlimited recursion is bad. > >+/* Mark HAS_COMPONENTS for every block dominated by at least one block > >with > >+ PRO_COMPONENTS set for the respective components, starting at HEAD. */ > >+static void > >+spread_components (basic_block head) > I don't see any references to PRO_COMPONENTS in the actual code. If I'm > reading the code correctly, you're just accumulating/propagating > HAS_COMPONENTS from parents to children via a DFS walk of the dominator > tree, right? Yeah, s/PRO_COMPONENTS/HAS_COMPONENTS/. I have renamed things a few time but missed this one due to the ALL VARS IN ALL CAPS style :-) > >+/* Place code for prologues and epilogues for COMPONENTS where we can put > >+ that code at the end of basic blocks. */ > >+static void > >+emit_common_tails_for_components (sbitmap components) > [ Snip. ] > >+ > >+ /* Put the code at the end of the BB, but before any final jump. */ > >+ if (!bitmap_empty_p (epi)) > So what if the final jump uses hard registers in one way or another? I > don't immediately see anything that verifies it is safe to transpose the > epilogue and the final jump. Whoops. Thanks for catching this. > Conceptually want the epilogue to occur on the outgoing edge(s). But > you want to actually transpose the epilogue and the control flow insn so > that you can insert the epilogue in one place. The same problem happens with prologues, too. > But naive transposition isn't safe. The control flow insn may use one > or more registers that you were restoring or you may be on a cc0 target. A cc0 target can not use separate shrink-wrapping *anyway* if any of the components would clobber cc0, so that is no problem here. > I think you need to handle the former more cleanly. The latter I'd > be comfortable filtering out in try_shrink_wrapping_separate. I'm thinking to not do the common tail optimisation if BB_END is a JUMP_INSN but not simplejump_p (or a return or a sibling call). Do you see any problem with that? > This has similarities to some of the problems we've had in the past with > rtl-gcse insertion as well as output reloads on insns with control flow. Oh, I had so much fun recently with the latter :-/ > With transposition issue addressed, the only blocker I see are some > simple testcases we can add to the suite. They don't have to be real > extensive. Yeah, working on it. > And one motivating example for the list archives, ideally > the glibc malloc case. Okay. Thanks for the reviews, Segher