Re: [Patch] Improving jump-thread pass for PR 54742

Jeff Law Sat, 22 Nov 2014 11:32:19 -0800

On 11/18/14 15:19, Sebastian Pop wrote:

The regions that we duplicate start inside a loop and stay inside the same loop,
and the jump threading path is not allowed to go in deeper nested loops.


The reason why we need to modify the sese duplication function is that the sese
region that we need to duplicate starts at an arbitrary place inside the loop,
whereas the current user of the sese duplication function tree-ssa-loop-ch.c:245
starts at the edge entering the loop and exits at the latch edge.

>I'll leave the rest to Jeff but it looks good to me from an overall
>structure.
>

Thanks for your review.

Sebastian

PS: Patch passed bootstrap and regtest on x86_64-linux.

PS: I have run some perf analysis with the patch:
- on a bootstrap of GCC I see 3209 FSM jump threads
- libpng and libjpeg contain FSM jump threads, the perf increase is in the 1%
   (measured on simulators and reduced data sets)
- coremark gets jump threaded (as expected)
- I'm setting up the llvm test-suite and I will report perf differences

So that's *far* more jump threads than I would expect this to find in abootstrap of GCC -- like 3 orders of magnitude more than I'd expect to find.


I haven't dug deep, but the first level analysis is not encouraging.

Basically I used the trunk compiler with and without your patch to buildgcc-4.7.3's cc1 (4.7.3 simply because that's what I last used thistesting framework). So that gives me two cc1s that I then use tocompile a bunch of .i files under valgrind's (cachegrind) control.


valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ......

That gives me two hunks of data for each input file I test.Specifically I get the dynamic number of instructions and the dynamicnumber of branches executed.

For jump threading those values correspond directly to the effect we'relooking for -- a reduction in dynamic conditional jumps and a reductionin dynamic instructions executed. Typically the change in dynamicinstructions executed is 2-3X the change in dynamic conditional jumps --which makes sense as removing the conditional jump usually means weremove a comparison and possibly some setup code as well.

Consistently with your patch those values get worse. Across the entireset of .i files I get


For the trunk:

instructions:1339016494968
branches:     243568982489

With your patch:

instructions:1339739533291
branches:     243806615986

So that's 723038323 more instructions and 237633497 more branches afterinstalling your patch. While we're looking a just under .1% regressionin dynamic branches, that's a terrible result for this work.

I'm not sure if the threads you're optimizing are somehow hiding otherjump threading opportunities or somehow hiding CSE-able jump conditions,mucking up a loop structure or something else but something very bad ishappening here.


If I put Steve's patch through the same testing I get:

instructions:1339006760834
branches:     243565768224

Which you'll note is a *very* slight decrease of 3214265 dynamicbranches as 9734134 total instructions executed.

So I think we need to dig deeper into why the branching behaviour of GCCgets noticeably worse with your patch when it should be as good as orbetter than without your patch.

I know when i was analyzing the last update to this code, I found caseswhere we're much better off taking the shorter jump threading pathwithout a joiner rather than preferring the long path with a joiner.IIRC, the issue was that if we selected the joiner path, then theduplication would create another jump threading opportunity (theoriginal, shorter path without a join) that wouldn't be seen until thenext pass of jump threading.


Jeff

Re: [Patch] Improving jump-thread pass for PR 54742

Reply via email to