Re: [Patch] Improving jump-thread pass for PR 54742

Richard Biener Tue, 25 Nov 2014 01:37:55 -0800

On Mon, Nov 24, 2014 at 11:05 PM, Sebastian Pop <seb...@gmail.com> wrote:
> Jeff Law wrote:
>> On 11/23/14 15:22, Sebastian Pop wrote:
>> >The second patch attached limits the search for FSM jump threads to loops.  
>> >With
>> >that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap
>> >(and 424 jump threads on powerpc64-linux bootstrap.)
>> >
>> Yea, that was one of the things I was going to poke at as well as a
>> quick scan of your patch gave me the impression it wasn't limited to
>> loops.
>>
>> Again, I haven't looked much at the patch, but I got the impression
>> you're doing a backwards walk through the predecessors to discover
>> the result of the COND_EXPR.  Correct?
>
> Yes.
>
>>
>> That's something I'd been wanting to do -- basically start with a
>> COND_EXPR, then walk the dataflow backwards substituting values into
>> the COND_EXPR (possibly creating non-gimple).  Ultimately the goal
>> is to substitute and fold, getting to a constant :-)
>>
>> The forward exhaustive stuff we do now is, crazy.   The backwards
>> approach could be decoupled from DOM & VRP into an independent pass,
>> which I think would be wise.
>>
>> Using a SEME region copier is also something I really wanted to do
>> long term.  In fact, I believe a lot of tree-ssa-threadupdate.c
>> ought to be ripped out and replaced with a SEME based copier.
>
> I did an experiment around these lines over the week-end, and now that you
> mention it, I feel less shy to speak about; well the patch does not yet pass
> bootstrap, and there still are about 20 failing test-cases.  I feel better
> reading the code generation part of jump-threading after this patch ;-)
> Basically I think all the tree-ssa-threadupdate.c can be replaced by
> duplicate_seme_region that generalizes the code generation.


Btw I once thought about doing on-the-fly lattice use/update and folding
during basic-block copying (or even re-generating expressions via
simplifying gimple_build ()).  Or have a substitute-and-fold like
facility that can run on SEME regions and do this.

Richard.

>> It appears you've built at least parts of two pieces needed to all
>> this as a Bodik style optimizer.  Which is exactly the long term
>> direction I think this code ought to take.
>>
>>
>> >
>> >One of the reasons I think we see more branches is that in sese region 
>> >copying we
>> >do not use the knowledge of the value of the condition for the last branch 
>> >in a
>> >jump-thread path: we rely on other propagation passes to remove the branch. 
>> > The
>> >last attached patch adds:
>> >
>> >   /* Remove the last branch in the jump thread path.  */
>> >   remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], 
>> > exit->dest);
>> That's certainly a possibility.  But I would expect that even with
>> this limitation something would be picking up the fact that the
>> branch is statically computable (even if it's an RTL optimizer).
>> But it's definitely something to look for.
>>
>> >
>> >Please let me know if the attached patches are producing better results on 
>> >gcc.
>>
>> For the trunk:
>>   instructions:1339016494968
>>   branches     :243568982489
>>
>> First version of your patch:
>>
>>   instructions:1339739533291
>>   branches:     243806615986
>>
>> Latest version of your patch:
>>
>>   instructions:1339749122609
>>   branches:     243809838262
>
> I think I got about the same results.
>
> I got my scripts installed on the gcc-farm.  I first used an x86_64 gcc75 and
> valgrind was crashing not recognizing how to decode an instruction.  Then I
> moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus
> compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because
> there is a bit of noise in all these numbers)
>
> $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 
> ~/alias.ii
>
> all 4 patches:
>
> ==153617== I   refs:      13,914,038,211
> ==153617==
> ==153617== Branches:       1,926,407,760  (1,879,827,481 cond + 46,580,279 
> ind)
> ==153617== Mispredicts:      144,890,904  (  132,094,105 cond + 12,796,799 
> ind)
> ==153617== Mispred rate:             7.5% (          7.0%     +       27.4%   
> )
>
> ==34993== I   refs:      13,915,335,629
> ==34993==
> ==34993== Branches:       1,926,597,919  (1,880,017,558 cond + 46,580,361 ind)
> ==34993== Mispredicts:      144,974,266  (  132,177,440 cond + 12,796,826 ind)
> ==34993== Mispred rate:             7.5% (          7.0%     +       27.4%   )
>
> ==140841== I   refs:      13,915,334,459
> ==140841==
> ==140841== Branches:       1,926,597,819  (1,880,017,458 cond + 46,580,361 
> ind)
> ==140841== Mispredicts:      144,974,296  (  132,177,470 cond + 12,796,826 
> ind)
> ==140841== Mispred rate:             7.5% (          7.0%     +       27.4%   
> )
>
> patch 1:
>
> ==99902== I   refs:      13,915,069,710
> ==99902==
> ==99902== Branches:       1,926,963,813  (1,880,376,148 cond + 46,587,665 ind)
> ==99902== Mispredicts:      145,501,564  (  132,656,576 cond + 12,844,988 ind)
> ==99902== Mispred rate:             7.5% (          7.0%     +       27.5%   )
>
> ==3907== I   refs:      13,915,082,469
> ==3907==
> ==3907== Branches:       1,926,965,218  (1,880,377,471 cond + 46,587,747 ind)
> ==3907== Mispredicts:      145,501,569  (  132,656,554 cond + 12,845,015 ind)
> ==3907== Mispred rate:             7.5% (          7.0%     +       27.5%   )
>
> ==44271== I   refs:      13,915,111,997
> ==44271==
> ==44271== Branches:       1,926,968,863  (1,880,380,952 cond + 46,587,911 ind)
> ==44271== Mispredicts:      145,501,858  (  132,656,789 cond + 12,845,069 ind)
> ==44271== Mispred rate:             7.5% (          7.0%     +       27.5%   )
>
> master no-patch:
>
> ==129233== I   refs:      13,910,221,913
> ==129233==
> ==129233== Branches:       1,925,715,095  (1,879,277,776 cond + 46,437,319 
> ind)
> ==129233== Mispredicts:      144,133,332  (  131,510,534 cond + 12,622,798 
> ind)
> ==129233== Mispred rate:             7.4% (          6.9%     +       27.1%   
> )
>
> ==147659== I   refs:      13,910,216,249
> ==147659==
> ==147659== Branches:       1,925,714,029  (1,879,276,708 cond + 46,437,321 
> ind)
> ==147659== Mispredicts:      144,127,970  (  131,505,172 cond + 12,622,798 
> ind)
> ==147659== Mispred rate:             7.4% (          6.9%     +       27.1%   
> )
>
> ==155206== I   refs:      13,910,201,237
> ==155206==
> ==155206== Branches:       1,925,712,267  (1,879,275,030 cond + 46,437,237 
> ind)
> ==155206== Mispredicts:      144,128,313  (  131,505,542 cond + 12,622,771 
> ind)
> ==155206== Mispred rate:             7.4% (          6.9%     +       27.1%   
> )
>
>
> I think that there are about 5 million more instructions executed with the 
> first
> patch, and the other patches on top do not really help.
>
>>
>> Which is in the noise for this test.  Which makes me wonder if I
>> botched something on the latest run.  It doesn't appear so, but I'm
>> re-running just to be sure.  I'm also turning on -g so that I can
>> use cg_annotate to poke a bit deeper and perhaps identify one or
>> more concrete examples where your patch is making this worse.
>
> Thanks,
> Sebastian

Re: [Patch] Improving jump-thread pass for PR 54742

Reply via email to