On 11/23/14 15:22, Sebastian Pop wrote:
The second patch attached limits the search for FSM jump threads to loops.  With
that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap
(and 424 jump threads on powerpc64-linux bootstrap.)

Yea, that was one of the things I was going to poke at as well as a quick scan of your patch gave me the impression it wasn't limited to loops.

Again, I haven't looked much at the patch, but I got the impression you're doing a backwards walk through the predecessors to discover the result of the COND_EXPR. Correct?

That's something I'd been wanting to do -- basically start with a COND_EXPR, then walk the dataflow backwards substituting values into the COND_EXPR (possibly creating non-gimple). Ultimately the goal is to substitute and fold, getting to a constant :-)

The forward exhaustive stuff we do now is, crazy. The backwards approach could be decoupled from DOM & VRP into an independent pass, which I think would be wise.

Using a SEME region copier is also something I really wanted to do long term. In fact, I believe a lot of tree-ssa-threadupdate.c ought to be ripped out and replaced with a SEME based copier.

It appears you've built at least parts of two pieces needed to all this as a Bodik style optimizer. Which is exactly the long term direction I think this code ought to take.



One of the reasons I think we see more branches is that in sese region copying 
we
do not use the knowledge of the value of the condition for the last branch in a
jump-thread path: we rely on other propagation passes to remove the branch.  The
last attached patch adds:

   /* Remove the last branch in the jump thread path.  */
   remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest);
That's certainly a possibility. But I would expect that even with this limitation something would be picking up the fact that the branch is statically computable (even if it's an RTL optimizer). But it's definitely something to look for.


Please let me know if the attached patches are producing better results on gcc.

For the trunk:
  instructions:1339016494968
  branches     :243568982489

First version of your patch:

  instructions:1339739533291
  branches:     243806615986

Latest version of your patch:

  instructions:1339749122609
  branches:     243809838262

Which is in the noise for this test. Which makes me wonder if I botched something on the latest run. It doesn't appear so, but I'm re-running just to be sure. I'm also turning on -g so that I can use cg_annotate to poke a bit deeper and perhaps identify one or more concrete examples where your patch is making this worse.

Jeff


Reply via email to