Re: Early jump threading

Jeff Law Thu, 18 Aug 2016 12:56:47 -0700

On 08/11/2016 08:02 AM, Jan Hubicka wrote:

Hi,
this patch adds early jump threading pass.  Jump threading is one of most
common cases where estimated profile becomes corrupted, because the branches
are predicted independently beforehand. This patch simply adds early mode to
jump threader that does not permit code growth and thus only win/win threads
are done before profile is constructed.


Naturally this also improves IPA decisions because code size/time is estimated
more acurately.

It is not very cool to add another pass and the jump threader is already
run 5 times. I think incrementally we could drop one of late threaders at least.
I tried to measure compile time effects but they are in wash. Moreover the patch
pays for itself in cc1plus code size:

Before patch to tweak size estimates: 27779964
Current mainline:                     27748900
With patch applied:                   27716173

So despite adding few functions the code size effect is positive which I think
is quite nice.

Given the fact that jump threading seems quite lightweit, I wonder why it is
guarded by flag_expensive_optimizations? Is it expensive in terms of code
size?

The effectivity of individual threading passes is as follows (for tramp3d)

      mainline                              with patch
pass  thread count     profile mismatches   thread count    profile mismatch
early                                       525
1     1853             1900                 316             337
2     4                812                  4               288

So the real question here is what did VRP1 threading do between thestandalone thread1/thread2 passes. ie, it may look tempting toeliminate the standalone thread2 pass, but threading done by VRP1 isexpected to be pushed into the thread{1,2} passes. So removing thread2is premature at this point.


The patch distorts testusite somewhat, in most cases we only want to update
dump files or disable early threading:

+XPASS: gcc.dg/uninit-15.c  (test for warnings, line 13)
+XPASS: gcc.dg/uninit-15.c  (test for warnings, line 23)
+FAIL: gcc.dg/uninit-15.c  (test for warnings, line 24)

This seems like a step backwards to me. We're warning about "j", but inthe context in which its used, we really ought to be warning about "i".And we lost a later warning.

+FAIL: gcc.dg/tree-ssa/pr68198.c scan-tree-dump-times thread1 "Registering FSM" 
1

Moved into early-threading, so adjusting test for that seems appropriate.

+FAIL: gcc.dg/tree-ssa/pr69196-1.c scan-tree-dump thread1 "FSM did not thread around 
loop and would copy too many statements"

I'm not sure how to best check this one. Threading this test tooaggressively results in something like a 2X bloat in the resulting codeon the sparc.

+FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2b.c scan-tree-dump-times thread1 "Jumps 
threaded: 1" 1

Moved into early threading, so adjusting the test for that seemsappropriate to me.

+FAIL: gcc.dg/tree-ssa/ssa-thread-13.c scan-tree-dump thread1 "FSM"

Moved into early threading, so adjusting the test for that seemsappropriate to me.

+FAIL: gcc.dg/tree-ssa/vrp01.c scan-tree-dump-times vrp1 "Folding predicate p_.*to 
1" 1

So the optimization moved into fre1 as a result of early jump threadingwhich is good. We should probably check for that, even if the test isbadly mis-named now.

But this is showing one of the cases where the backwards threader isdeficient. It's not using the conditional as an implied set. It's aknown limitation and I'm hoping Andrew's work makes that an easierproblem to solve.

+FAIL: gcc.dg/tree-ssa/vrp56.c scan-tree-dump thread1 "Jumps threaded: 1"

Moved into early threading, so adjusting the test for that seemsappropriate to me.

+FAIL: gcc.dg/tree-ssa/vrp92.c scan-tree-dump vrp1 "res_.: \\\\[1, 1\\\\]"

Some results adjustment seems to be in order here. I think verifyingthat early threading finds the obvious jump threads, then vrp is able toeliminate the 2nd conditional and the result is a collapsed return i forthe entire function.


This testcase is the now probably unnecesary heuristics (it may still be
relevant in cases we do not thread because of size bounds but its effectivity
may not be worth the maintenance cost):

+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 1
+FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"extra loop exit heuristics of edge[^:]*:" 2
+FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times profile_estimate 
"loop exit heuristics of edge[^:]*:" 3

You're the best judge on how to deal with these. I leave that to yourdiscretion.

If the patch seems acceptable, I will do the updates. One option why I did
not do that is that it seems to be now posisble to pass parameters to passes
from passes.def, so perhaps we do not need early_thread_jumps, but doing so is
consistent with way we handle other early passes.

Your call on this. One of the things in the related literature is alimiter on how far back the use-def chains we go. I'd envisioned addingthat limiter and having early jump threading use a different (muchsmaller) limiter. But I don't think it's critical at this point.


Bootstrapped/regtested x86_64-linux
Honza

        * passes.def (pass_early_thread_jumps): Schedule after forwprop.
        * tree-pass.h (make_pass_early_thread_jumps): Declare.
        * tree-ssa-threadbackward.c (fsm_find_thread_path,
        fsm_find_thread_path, profitable_jump_thread_path,
        fsm_find_control_statement_thread_paths,
        find_jump_threads_backwards): Add speed_p parameter.
        (pass_data_early_thread_jumps): New pass.
        (make_pass_early_thread_jumps): New function.

LGTM.  OK after fixing up the tests.

Jeff

Re: Early jump threading

Reply via email to