On Mon, Nov 24, 2014 at 11:05 PM, Sebastian Pop <seb...@gmail.com> wrote: > Jeff Law wrote: >> On 11/23/14 15:22, Sebastian Pop wrote: >> >The second patch attached limits the search for FSM jump threads to loops. >> >With >> >that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap >> >(and 424 jump threads on powerpc64-linux bootstrap.) >> > >> Yea, that was one of the things I was going to poke at as well as a >> quick scan of your patch gave me the impression it wasn't limited to >> loops. >> >> Again, I haven't looked much at the patch, but I got the impression >> you're doing a backwards walk through the predecessors to discover >> the result of the COND_EXPR. Correct? > > Yes. > >> >> That's something I'd been wanting to do -- basically start with a >> COND_EXPR, then walk the dataflow backwards substituting values into >> the COND_EXPR (possibly creating non-gimple). Ultimately the goal >> is to substitute and fold, getting to a constant :-) >> >> The forward exhaustive stuff we do now is, crazy. The backwards >> approach could be decoupled from DOM & VRP into an independent pass, >> which I think would be wise. >> >> Using a SEME region copier is also something I really wanted to do >> long term. In fact, I believe a lot of tree-ssa-threadupdate.c >> ought to be ripped out and replaced with a SEME based copier. > > I did an experiment around these lines over the week-end, and now that you > mention it, I feel less shy to speak about; well the patch does not yet pass > bootstrap, and there still are about 20 failing test-cases. I feel better > reading the code generation part of jump-threading after this patch ;-) > Basically I think all the tree-ssa-threadupdate.c can be replaced by > duplicate_seme_region that generalizes the code generation.
Btw I once thought about doing on-the-fly lattice use/update and folding during basic-block copying (or even re-generating expressions via simplifying gimple_build ()). Or have a substitute-and-fold like facility that can run on SEME regions and do this. Richard. >> It appears you've built at least parts of two pieces needed to all >> this as a Bodik style optimizer. Which is exactly the long term >> direction I think this code ought to take. >> >> >> > >> >One of the reasons I think we see more branches is that in sese region >> >copying we >> >do not use the knowledge of the value of the condition for the last branch >> >in a >> >jump-thread path: we rely on other propagation passes to remove the branch. >> > The >> >last attached patch adds: >> > >> > /* Remove the last branch in the jump thread path. */ >> > remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], >> > exit->dest); >> That's certainly a possibility. But I would expect that even with >> this limitation something would be picking up the fact that the >> branch is statically computable (even if it's an RTL optimizer). >> But it's definitely something to look for. >> >> > >> >Please let me know if the attached patches are producing better results on >> >gcc. >> >> For the trunk: >> instructions:1339016494968 >> branches :243568982489 >> >> First version of your patch: >> >> instructions:1339739533291 >> branches: 243806615986 >> >> Latest version of your patch: >> >> instructions:1339749122609 >> branches: 243809838262 > > I think I got about the same results. > > I got my scripts installed on the gcc-farm. I first used an x86_64 gcc75 and > valgrind was crashing not recognizing how to decode an instruction. Then I > moved to gcc112 a powerpc64-linux where I got this data from stage2 cc1plus > compiling the same file alias.ii at -O2: (I got 3 runs of each mostly because > there is a bit of noise in all these numbers) > > $ valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ./cc1plus -O2 > ~/alias.ii > > all 4 patches: > > ==153617== I refs: 13,914,038,211 > ==153617== > ==153617== Branches: 1,926,407,760 (1,879,827,481 cond + 46,580,279 > ind) > ==153617== Mispredicts: 144,890,904 ( 132,094,105 cond + 12,796,799 > ind) > ==153617== Mispred rate: 7.5% ( 7.0% + 27.4% > ) > > ==34993== I refs: 13,915,335,629 > ==34993== > ==34993== Branches: 1,926,597,919 (1,880,017,558 cond + 46,580,361 ind) > ==34993== Mispredicts: 144,974,266 ( 132,177,440 cond + 12,796,826 ind) > ==34993== Mispred rate: 7.5% ( 7.0% + 27.4% ) > > ==140841== I refs: 13,915,334,459 > ==140841== > ==140841== Branches: 1,926,597,819 (1,880,017,458 cond + 46,580,361 > ind) > ==140841== Mispredicts: 144,974,296 ( 132,177,470 cond + 12,796,826 > ind) > ==140841== Mispred rate: 7.5% ( 7.0% + 27.4% > ) > > patch 1: > > ==99902== I refs: 13,915,069,710 > ==99902== > ==99902== Branches: 1,926,963,813 (1,880,376,148 cond + 46,587,665 ind) > ==99902== Mispredicts: 145,501,564 ( 132,656,576 cond + 12,844,988 ind) > ==99902== Mispred rate: 7.5% ( 7.0% + 27.5% ) > > ==3907== I refs: 13,915,082,469 > ==3907== > ==3907== Branches: 1,926,965,218 (1,880,377,471 cond + 46,587,747 ind) > ==3907== Mispredicts: 145,501,569 ( 132,656,554 cond + 12,845,015 ind) > ==3907== Mispred rate: 7.5% ( 7.0% + 27.5% ) > > ==44271== I refs: 13,915,111,997 > ==44271== > ==44271== Branches: 1,926,968,863 (1,880,380,952 cond + 46,587,911 ind) > ==44271== Mispredicts: 145,501,858 ( 132,656,789 cond + 12,845,069 ind) > ==44271== Mispred rate: 7.5% ( 7.0% + 27.5% ) > > master no-patch: > > ==129233== I refs: 13,910,221,913 > ==129233== > ==129233== Branches: 1,925,715,095 (1,879,277,776 cond + 46,437,319 > ind) > ==129233== Mispredicts: 144,133,332 ( 131,510,534 cond + 12,622,798 > ind) > ==129233== Mispred rate: 7.4% ( 6.9% + 27.1% > ) > > ==147659== I refs: 13,910,216,249 > ==147659== > ==147659== Branches: 1,925,714,029 (1,879,276,708 cond + 46,437,321 > ind) > ==147659== Mispredicts: 144,127,970 ( 131,505,172 cond + 12,622,798 > ind) > ==147659== Mispred rate: 7.4% ( 6.9% + 27.1% > ) > > ==155206== I refs: 13,910,201,237 > ==155206== > ==155206== Branches: 1,925,712,267 (1,879,275,030 cond + 46,437,237 > ind) > ==155206== Mispredicts: 144,128,313 ( 131,505,542 cond + 12,622,771 > ind) > ==155206== Mispred rate: 7.4% ( 6.9% + 27.1% > ) > > > I think that there are about 5 million more instructions executed with the > first > patch, and the other patches on top do not really help. > >> >> Which is in the noise for this test. Which makes me wonder if I >> botched something on the latest run. It doesn't appear so, but I'm >> re-running just to be sure. I'm also turning on -g so that I can >> use cg_annotate to poke a bit deeper and perhaps identify one or >> more concrete examples where your patch is making this worse. > > Thanks, > Sebastian