Jeff Law wrote: > On 11/23/14 15:22, Sebastian Pop wrote: > >The second patch attached limits the search for FSM jump threads to loops. > >With > >that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap > >(and 424 jump threads on powerpc64-linux bootstrap.) > > > Yea, that was one of the things I was going to poke at as well as a > quick scan of your patch gave me the impression it wasn't limited to > loops. > > Again, I haven't looked much at the patch, but I got the impression > you're doing a backwards walk through the predecessors to discover > the result of the COND_EXPR. Correct?
Yes. > > That's something I'd been wanting to do -- basically start with a > COND_EXPR, then walk the dataflow backwards substituting values into > the COND_EXPR (possibly creating non-gimple). Ultimately the goal > is to substitute and fold, getting to a constant :-) > > The forward exhaustive stuff we do now is, crazy. The backwards > approach could be decoupled from DOM & VRP into an independent pass, > which I think would be wise. > > Using a SEME region copier is also something I really wanted to do > long term. In fact, I believe a lot of tree-ssa-threadupdate.c > ought to be ripped out and replaced with a SEME based copier. I did an experiment around these lines over the week-end, and now that you mention it, I feel less shy to speak about; well the patch does not yet pass bootstrap, and there still are about 20 failing test-cases. I feel better reading the code generation part of jump-threading after this patch ;-) Basically I think all the tree-ssa-threadupdate.c can be replaced by duplicate_seme_region that generalizes the code generation. > > It appears you've built at least parts of two pieces needed to all > this as a Bodik style optimizer. Which is exactly the long term > direction I think this code ought to take. > > > > > >One of the reasons I think we see more branches is that in sese region > >copying we > >do not use the knowledge of the value of the condition for the last branch > >in a > >jump-thread path: we rely on other propagation passes to remove the branch. > >The > >last attached patch adds: > > > > /* Remove the last branch in the jump thread path. */ > > remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], > > exit->dest); > That's certainly a possibility. But I would expect that even with > this limitation something would be picking up the fact that the > branch is statically computable (even if it's an RTL optimizer). > But it's definitely something to look for. > > > > >Please let me know if the attached patches are producing better results on > >gcc. > > For the trunk: > instructions:1339016494968 > branches :243568982489 > > First version of your patch: > > instructions:1339739533291 > branches: 243806615986 > > Latest version of your patch: > > instructions:1339749122609 > branches: 243809838262 I think I got about the same results. I got my scripts installed on the gcc-farm. I first used an x86_64 gcc75 and valgrind was crashing not recognizing how to decode an instruction. Then I moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because there is a bit of noise in all these numbers) $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 ~/alias.ii all 4 patches: ==153617== I refs: 13,914,038,211 ==153617== ==153617== Branches: 1,926,407,760 (1,879,827,481 cond + 46,580,279 ind) ==153617== Mispredicts: 144,890,904 ( 132,094,105 cond + 12,796,799 ind) ==153617== Mispred rate: 7.5% ( 7.0% + 27.4% ) ==34993== I refs: 13,915,335,629 ==34993== ==34993== Branches: 1,926,597,919 (1,880,017,558 cond + 46,580,361 ind) ==34993== Mispredicts: 144,974,266 ( 132,177,440 cond + 12,796,826 ind) ==34993== Mispred rate: 7.5% ( 7.0% + 27.4% ) ==140841== I refs: 13,915,334,459 ==140841== ==140841== Branches: 1,926,597,819 (1,880,017,458 cond + 46,580,361 ind) ==140841== Mispredicts: 144,974,296 ( 132,177,470 cond + 12,796,826 ind) ==140841== Mispred rate: 7.5% ( 7.0% + 27.4% ) patch 1: ==99902== I refs: 13,915,069,710 ==99902== ==99902== Branches: 1,926,963,813 (1,880,376,148 cond + 46,587,665 ind) ==99902== Mispredicts: 145,501,564 ( 132,656,576 cond + 12,844,988 ind) ==99902== Mispred rate: 7.5% ( 7.0% + 27.5% ) ==3907== I refs: 13,915,082,469 ==3907== ==3907== Branches: 1,926,965,218 (1,880,377,471 cond + 46,587,747 ind) ==3907== Mispredicts: 145,501,569 ( 132,656,554 cond + 12,845,015 ind) ==3907== Mispred rate: 7.5% ( 7.0% + 27.5% ) ==44271== I refs: 13,915,111,997 ==44271== ==44271== Branches: 1,926,968,863 (1,880,380,952 cond + 46,587,911 ind) ==44271== Mispredicts: 145,501,858 ( 132,656,789 cond + 12,845,069 ind) ==44271== Mispred rate: 7.5% ( 7.0% + 27.5% ) master no-patch: ==129233== I refs: 13,910,221,913 ==129233== ==129233== Branches: 1,925,715,095 (1,879,277,776 cond + 46,437,319 ind) ==129233== Mispredicts: 144,133,332 ( 131,510,534 cond + 12,622,798 ind) ==129233== Mispred rate: 7.4% ( 6.9% + 27.1% ) ==147659== I refs: 13,910,216,249 ==147659== ==147659== Branches: 1,925,714,029 (1,879,276,708 cond + 46,437,321 ind) ==147659== Mispredicts: 144,127,970 ( 131,505,172 cond + 12,622,798 ind) ==147659== Mispred rate: 7.4% ( 6.9% + 27.1% ) ==155206== I refs: 13,910,201,237 ==155206== ==155206== Branches: 1,925,712,267 (1,879,275,030 cond + 46,437,237 ind) ==155206== Mispredicts: 144,128,313 ( 131,505,542 cond + 12,622,771 ind) ==155206== Mispred rate: 7.4% ( 6.9% + 27.1% ) I think that there are about 5 million more instructions executed with the first patch, and the other patches on top do not really help. > > Which is in the noise for this test. Which makes me wonder if I > botched something on the latest run. It doesn't appear so, but I'm > re-running just to be sure. I'm also turning on -g so that I can > use cg_annotate to poke a bit deeper and perhaps identify one or > more concrete examples where your patch is making this worse. Thanks, Sebastian