Re: [Patch] Improving jump-thread pass for PR 54742

Sebastian Pop Mon, 24 Nov 2014 14:06:10 -0800

Jeff Law wrote:
> On 11/23/14 15:22, Sebastian Pop wrote:
> >The second patch attached limits the search for FSM jump threads to loops.  
> >With
> >that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap
> >(and 424 jump threads on powerpc64-linux bootstrap.)
> >
> Yea, that was one of the things I was going to poke at as well as a
> quick scan of your patch gave me the impression it wasn't limited to
> loops.
> 
> Again, I haven't looked much at the patch, but I got the impression
> you're doing a backwards walk through the predecessors to discover
> the result of the COND_EXPR.  Correct?


Yes.

> 
> That's something I'd been wanting to do -- basically start with a
> COND_EXPR, then walk the dataflow backwards substituting values into
> the COND_EXPR (possibly creating non-gimple).  Ultimately the goal
> is to substitute and fold, getting to a constant :-)
> 
> The forward exhaustive stuff we do now is, crazy.   The backwards
> approach could be decoupled from DOM & VRP into an independent pass,
> which I think would be wise.
> 
> Using a SEME region copier is also something I really wanted to do
> long term.  In fact, I believe a lot of tree-ssa-threadupdate.c
> ought to be ripped out and replaced with a SEME based copier.

I did an experiment around these lines over the week-end, and now that you
mention it, I feel less shy to speak about; well the patch does not yet pass
bootstrap, and there still are about 20 failing test-cases.  I feel better
reading the code generation part of jump-threading after this patch ;-)
Basically I think all the tree-ssa-threadupdate.c can be replaced by
duplicate_seme_region that generalizes the code generation.

> 
> It appears you've built at least parts of two pieces needed to all
> this as a Bodik style optimizer.  Which is exactly the long term
> direction I think this code ought to take.
> 
> 
> >
> >One of the reasons I think we see more branches is that in sese region 
> >copying we
> >do not use the knowledge of the value of the condition for the last branch 
> >in a
> >jump-thread path: we rely on other propagation passes to remove the branch.  
> >The
> >last attached patch adds:
> >
> >   /* Remove the last branch in the jump thread path.  */
> >   remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], 
> > exit->dest);
> That's certainly a possibility.  But I would expect that even with
> this limitation something would be picking up the fact that the
> branch is statically computable (even if it's an RTL optimizer).
> But it's definitely something to look for.
> 
> >
> >Please let me know if the attached patches are producing better results on 
> >gcc.
> 
> For the trunk:
>   instructions:1339016494968
>   branches     :243568982489
> 
> First version of your patch:
> 
>   instructions:1339739533291
>   branches:     243806615986
> 
> Latest version of your patch:
> 
>   instructions:1339749122609
>   branches:     243809838262

I think I got about the same results.

I got my scripts installed on the gcc-farm.  I first used an x86_64 gcc75 and
valgrind was crashing not recognizing how to decode an instruction.  Then I
moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus
compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because
there is a bit of noise in all these numbers)

$ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 
~/alias.ii

all 4 patches:

==153617== I   refs:      13,914,038,211
==153617== 
==153617== Branches:       1,926,407,760  (1,879,827,481 cond + 46,580,279 ind)
==153617== Mispredicts:      144,890,904  (  132,094,105 cond + 12,796,799 ind)
==153617== Mispred rate:             7.5% (          7.0%     +       27.4%   )

==34993== I   refs:      13,915,335,629
==34993== 
==34993== Branches:       1,926,597,919  (1,880,017,558 cond + 46,580,361 ind)
==34993== Mispredicts:      144,974,266  (  132,177,440 cond + 12,796,826 ind)
==34993== Mispred rate:             7.5% (          7.0%     +       27.4%   )

==140841== I   refs:      13,915,334,459
==140841== 
==140841== Branches:       1,926,597,819  (1,880,017,458 cond + 46,580,361 ind)
==140841== Mispredicts:      144,974,296  (  132,177,470 cond + 12,796,826 ind)
==140841== Mispred rate:             7.5% (          7.0%     +       27.4%   )

patch 1:

==99902== I   refs:      13,915,069,710
==99902== 
==99902== Branches:       1,926,963,813  (1,880,376,148 cond + 46,587,665 ind)
==99902== Mispredicts:      145,501,564  (  132,656,576 cond + 12,844,988 ind)
==99902== Mispred rate:             7.5% (          7.0%     +       27.5%   )

==3907== I   refs:      13,915,082,469
==3907== 
==3907== Branches:       1,926,965,218  (1,880,377,471 cond + 46,587,747 ind)
==3907== Mispredicts:      145,501,569  (  132,656,554 cond + 12,845,015 ind)
==3907== Mispred rate:             7.5% (          7.0%     +       27.5%   )

==44271== I   refs:      13,915,111,997
==44271== 
==44271== Branches:       1,926,968,863  (1,880,380,952 cond + 46,587,911 ind)
==44271== Mispredicts:      145,501,858  (  132,656,789 cond + 12,845,069 ind)
==44271== Mispred rate:             7.5% (          7.0%     +       27.5%   )

master no-patch:

==129233== I   refs:      13,910,221,913
==129233== 
==129233== Branches:       1,925,715,095  (1,879,277,776 cond + 46,437,319 ind)
==129233== Mispredicts:      144,133,332  (  131,510,534 cond + 12,622,798 ind)
==129233== Mispred rate:             7.4% (          6.9%     +       27.1%   )

==147659== I   refs:      13,910,216,249
==147659== 
==147659== Branches:       1,925,714,029  (1,879,276,708 cond + 46,437,321 ind)
==147659== Mispredicts:      144,127,970  (  131,505,172 cond + 12,622,798 ind)
==147659== Mispred rate:             7.4% (          6.9%     +       27.1%   )

==155206== I   refs:      13,910,201,237
==155206== 
==155206== Branches:       1,925,712,267  (1,879,275,030 cond + 46,437,237 ind)
==155206== Mispredicts:      144,128,313  (  131,505,542 cond + 12,622,771 ind)
==155206== Mispred rate:             7.4% (          6.9%     +       27.1%   )


I think that there are about 5 million more instructions executed with the first
patch, and the other patches on top do not really help.

> 
> Which is in the noise for this test.  Which makes me wonder if I
> botched something on the latest run.  It doesn't appear so, but I'm
> re-running just to be sure.  I'm also turning on -g so that I can
> use cg_annotate to poke a bit deeper and perhaps identify one or
> more concrete examples where your patch is making this worse.

Thanks,
Sebastian

Re: [Patch] Improving jump-thread pass for PR 54742

Reply via email to