On 6/9/21 2:09 PM, Richard Biener wrote:
On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote:
Hi Jeff. Hi folks.
What started as a foray into severing the old (forward) threader's
dependency on evrp, turned into a rewrite of the backwards threader
code. I'd like to discuss the possibility of replacing the current
backwards threader with a new one that gets far more threads and can
potentially subsume all threaders in the future.
I won't include code here, as it will just detract from the high level
discussion. But if it helps, I could post what I have, which just needs
some cleanups and porting to the latest trunk changes Andrew has made.
Currently the backwards threader works by traversing DEF chains through
PHIs leading to possible paths that start in a constant. When such a
path is found, it is checked to see if it is profitable, and if so, the
constant path is threaded. The current implementation is rather limited
since backwards paths must end in a constant. For example, the
backwards threader can't get any of the tests in
gcc.dg/tree-ssa/ssa-thread-14.c:
if (a && b)
foo ();
if (!b && c)
bar ();
etc.
After my refactoring patches to the threading code, it is now possible
to drop in an alternate implementation that shares the profitability
code (is this path profitable?), the jump registry, and the actual jump
threading code. I have leveraged this to write a ranger-based threader
that gets every single thread the current code gets, plus 90-130% more.
Here are the details from the branch, which should be very similar to
trunk. I'm presenting the branch numbers because they contain Andrew's
upcoming relational query which significantly juices up the results.
New threader:
ethread:65043 (+3.06%)
dom:32450 (-13.3%)
backwards threader:72482 (+89.6%)
vrp:40532 (-30.7%)
Total threaded: 210507 (+6.70%)
This means that the new code gets 89.6% more jump threading
opportunities than the code I want to replace. In doing so, it reduces
the amount of DOM threading opportunities by 13.3% and by 30.7% from the
VRP jump threader. The total improvement across the jump threading
opportunities in the compiler is 6.70%.
However, these are pessimistic numbers...
I have noticed that some of the threading opportunities that DOM and VRP
now get are not because they're smarter, but because they're picking up
opportunities that the new code exposes. I experimented with running an
iterative threader, and then seeing what VRP and DOM could actually get.
This is too expensive to do in real life, but it at least shows what
the effect of the new code is on DOM/VRP's abilities:
Iterative threader:
ethread:65043 (+3.06%)
dom:31170 (-16.7%)
thread:86717 (+127%)
vrp:33851 (-42.2%)
Total threaded: 216781 (+9.90%)
This means that the new code not only gets 127% more cases, but it
reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
respectively). The end result is that we have the possibility of
getting almost 10% more jump threading opportunities in the entire
compilation run.
Yeah, DOM once was iterating ...
You probably have noticed that we have very man (way too many)
'thread' passes, often in close succession with each other or
DOM or VRP. So in the above numbers I wonder if you can break
down the numbers individually for the actual passes (in their order)?
As promised.
*** LEGACY:
ethread42:61152 30.1369% (61152 threads for 30.1% of total)
thread117:29646 14.6101%
vrp118:62088 30.5982%
thread132:2232 1.09997%
dom133:31116 15.3346%
thread197:1950 0.960998%
dom198:10661 5.25395%
thread200:587 0.289285%
vrp201:3482 1.716%
Total: 202914
The above is from current trunk with my patches applied, defaulting to
legacy mode. It follows the pass number nomenclature in the
*.statistics files.
New threader code (This is what I envision current trunk to look with my
patchset):
*** RANGER:
ethread42:64389 30.2242%
thread117:49449 23.2114%
vrp118:46118 21.6478%
thread132:8153 3.82702%
dom133:27168 12.7527%
thread197:5542 2.60141%
dom198:8191 3.84485%
thread200:1038 0.487237%
vrp201:2990 1.40351%
Total: 213038
New threader code running in iterative mode (illustrative purposes only):
*** RANGER: iterative mode
ethread42:64389 28.895%
thread117:76013 34.1113%
vrp118:31823 14.2808%
thread132:7165 3.21534%
dom133:25881 11.6143%
thread197:6226 2.79396%
dom198:7536 3.38183%
thread200:971 0.435743%
vrp201:2834 1.27178%
Total: 222838
The above run in iterative mode is only used to show what the new code
fails to get since dom* and vrp* are still threading some things despite
repeated runs of the new code.
As I've mentioned, I haven't dug deep into what VRP* and DOM* are
threading over the new threader code. Last time I checked it was some
combination of:
a) DOM doing folding memory references into temporaries that could then
be used (even by the ranger-threader code) to resolve paths.
b) Things we should be able to tweak range-ops to get.
c) I did see some simple jumps to jumps, that should be trivial to get.
As this is future work, I didn't bother to look too deeply.
Note that over my 395 .ii files from a bootstrap, not once was the
legacy backward threading code able to get something the new code
couldn't get.
It seems like any threading past the first run of DOM are slim pickings.
Aldy
p.s. The above numbers do not reflect Andrew's upcoming relational work.
It seems there's very little benefit in the threader for it, as most
of the benefit was from the recalculation work.