On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
> On 09/23/2016 02:21 AM, Segher Boessenkool wrote:
> >--- a/gcc/function.c
> >+++ b/gcc/function.c
> >@@ -5920,9 +5920,7 @@ thread_prologue_and_epilogue_insns (void)
> >   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> >   edge orig_entry_edge = entry_edge;
> >
> >-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
> >   rtx_insn *prologue_seq = make_prologue_seq ();
> >-  rtx_insn *epilogue_seq = make_epilogue_seq ();
> >
> >   /* Try to perform a kind of shrink-wrapping, making sure the
> >      prologue/epilogue is emitted only around those parts of the
> >@@ -5930,6 +5928,13 @@ thread_prologue_and_epilogue_insns (void)
> >
> >   try_shrink_wrapping (&entry_edge, prologue_seq);
> >
> >+  try_shrink_wrapping_separate (entry_edge->dest);
> >+
> >+  if (crtl->shrink_wrapped_separate)
> >+    prologue_seq = make_prologue_seq ();
> Note this implies that make_prologue_seq (and consequently the target 
> specific bits for prologue generation) can be safely called more than 
> once.  That's probably OK, but might be worth a comment -- your decision 
> whether or not to add the comment.

It is only called twice if separately shrink-wrapping (and that did
anything), so it won't change current behaviour.

I'll add a comment.

> >+static void
> >+place_prologue_for_one_component (unsigned int which, basic_block head)
> >+{
> >+  /* The block we are currently dealing with.  */
> >+  basic_block bb = head;
> >+  /* Is this the first time we visit this block, i.e. have we just gone
> >+     down the tree.  */
> >+  bool first_visit = true;
> >+
> >+  /* Walk the dominator tree, visit one block per iteration of this loop.
> >+     Each basic block is visited twice: once before visiting any children
> >+     of the block, and once after visiting all of them (leaf nodes are
> >+     visited only once).  As an optimization, we do not visit subtrees
> >+     that can no longer influence the prologue placement.  */
> >+  for (;;)
> Is there some reason you wrote this as a loop rather than recursion? 
> IMHO it makes this function (and spread_components) more difficult to 
> reason about than it needs to be.

It would recurse a few thousand frames deep on not terribly big testcases
(i.e. I've seen it happen).  This uses a lot of stack space on some ABIs,
and is very slow as well (this is the slowest function here by far).
Unlimited recursion is bad.

> >+/* Mark HAS_COMPONENTS for every block dominated by at least one block 
> >with
> >+   PRO_COMPONENTS set for the respective components, starting at HEAD.  */
> >+static void
> >+spread_components (basic_block head)
> I don't see any references to PRO_COMPONENTS in the actual code.  If I'm 
> reading the code correctly, you're just accumulating/propagating 
> HAS_COMPONENTS from parents to children via a DFS walk of the dominator 
> tree, right?

Yeah, s/PRO_COMPONENTS/HAS_COMPONENTS/.  I have renamed things a few
time but missed this one due to the ALL VARS IN ALL CAPS style :-)

> >+/* Place code for prologues and epilogues for COMPONENTS where we can put
> >+   that code at the end of basic blocks.  */
> >+static void
> >+emit_common_tails_for_components (sbitmap components)
> [ Snip. ]
> >+
> >+      /* Put the code at the end of the BB, but before any final jump.  */
> >+      if (!bitmap_empty_p (epi))
> So what if the final jump uses hard registers in one way or another?   I 
> don't immediately see anything that verifies it is safe to transpose the 
> epilogue and the final jump.

Whoops.  Thanks for catching this.

> Conceptually want the epilogue to occur on the outgoing edge(s).  But 
> you want to actually transpose the epilogue and the control flow insn so 
> that you can insert the epilogue in one place.

The same problem happens with prologues, too.

> But naive transposition isn't safe.  The control flow insn may use one 
> or more registers that you were restoring or you may be on a cc0 target. 

A cc0 target can not use separate shrink-wrapping *anyway* if any of the
components would clobber cc0, so that is no problem here.

>   I think you need to handle the former more cleanly.  The latter I'd 
> be comfortable filtering out in try_shrink_wrapping_separate.

I'm thinking to not do the common tail optimisation if BB_END is a
JUMP_INSN but not simplejump_p (or a return or a sibling call).  Do
you see any problem with that?

> This has similarities to some of the problems we've had in the past with 
> rtl-gcse insertion as well as output reloads on insns with control flow.

Oh, I had so much fun recently with the latter :-/

> With transposition issue addressed, the only blocker I see are some 
> simple testcases we can add to the suite.  They don't have to be real 
> extensive.

Yeah, working on it.

> And one motivating example for the list archives, ideally 
> the glibc malloc case.

Okay.

Thanks for the reviews,


Segher

Reply via email to