On 11/23/14 15:22, Sebastian Pop wrote:
The second patch attached limits the search for FSM jump threads to loops. With
that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap
(and 424 jump threads on powerpc64-linux bootstrap.)
Yea, that was one of the things I was going to poke at as well as a
quick scan of your patch gave me the impression it wasn't limited to loops.
Again, I haven't looked much at the patch, but I got the impression
you're doing a backwards walk through the predecessors to discover the
result of the COND_EXPR. Correct?
That's something I'd been wanting to do -- basically start with a
COND_EXPR, then walk the dataflow backwards substituting values into the
COND_EXPR (possibly creating non-gimple). Ultimately the goal is to
substitute and fold, getting to a constant :-)
The forward exhaustive stuff we do now is, crazy. The backwards
approach could be decoupled from DOM & VRP into an independent pass,
which I think would be wise.
Using a SEME region copier is also something I really wanted to do long
term. In fact, I believe a lot of tree-ssa-threadupdate.c ought to be
ripped out and replaced with a SEME based copier.
It appears you've built at least parts of two pieces needed to all this
as a Bodik style optimizer. Which is exactly the long term direction I
think this code ought to take.
One of the reasons I think we see more branches is that in sese region copying
we
do not use the knowledge of the value of the condition for the last branch in a
jump-thread path: we rely on other propagation passes to remove the branch. The
last attached patch adds:
/* Remove the last branch in the jump thread path. */
remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest);
That's certainly a possibility. But I would expect that even with this
limitation something would be picking up the fact that the branch is
statically computable (even if it's an RTL optimizer). But it's
definitely something to look for.
Please let me know if the attached patches are producing better results on gcc.
For the trunk:
instructions:1339016494968
branches :243568982489
First version of your patch:
instructions:1339739533291
branches: 243806615986
Latest version of your patch:
instructions:1339749122609
branches: 243809838262
Which is in the noise for this test. Which makes me wonder if I botched
something on the latest run. It doesn't appear so, but I'm re-running
just to be sure. I'm also turning on -g so that I can use cg_annotate
to poke a bit deeper and perhaps identify one or more concrete examples
where your patch is making this worse.
Jeff