Hi folks.
I'm done with benchmarking, testing and cleanups, so I'd like to post my
patchset for review. However, before doing so, I'd like to address a
handful of meta-issues that may affect how I post these patches.
Trapping on differences
=======================
Originally I wanted to contribute verification code that would trap if
the legacy code threaded any edges the new code couldn't (to be removed
after a week). However, after having tested on various architectures
and only running once into a missing thread, I'm leaning towards
omitting the verification code, since it's fragile, time consuming, and
quite hacky.
For the record, I have tested on x86-64, aarch64, ppc64 and ppc64le.
There is only one case, across bootstrap and regression tests where the
verification code is ever tripped (discussed below).
Performance
===========
I re-ran benchmarks as per our callgrind suite, and the penalty with the
current pipeline is 1.55% of overall compilation time. As is being
discussed, we should be able to mitigate this significantly by removing
other threading passes.
Failing testcases
=================
I have yet to run into incorrect code being generated, but I have had to
tweak a considerable number of tests. I have verified every single
discrepancy and documented my changes in the testsuite when it merited
doing so. However, there are a couple tests that trigger regressions
and I'd like to ask for guidance on how to address them.
1. gcc.c-torture/compile/pr83510.c
I would like to XFAIL this.
What happens here is that thread1 threads a switch statement such that
the various cases have been split into different independent blocks.
One of these blocks exposes an arr[i_27] access which is later
propagated by VRP to be arr[10]. This is an invalid access, but the
array bounds code doesn't know it is an unreachable path.
However, it is not until dom2 that we "know" that the value of the
switch index is such that the path to arr[10] is unreachable. For that
matter, it is not until dom3 that we remove the unreachable path.
2. -Wfree-nonheap-object
This warning is triggered while cleaning up an auto_vec. I see that the
va_heap::release() inline is wrapped with a pragma ignore
"-Wfree-nonheap-object", but this is not sufficient because jump
threading may alter uses in such a way that may_emit_free_warning() will
warn on the *inlined* location, thus bypassing the pragma.
I worked around this with a mere:
> @@ -13839,6 +13839,7 @@ maybe_emit_free_warning (tree exp)
location_t loc = tree_inlined_location (exp);
+ loc = EXPR_LOCATION (exp);
but this causes a ton of Wfree-nonheap* tests to fail. I think someone
more knowledgeable should address this (msebor??).
3. uninit-pred-9_b.c
The uninit code is getting confused with the threading and the bogus
warning in line 24 is back. I looked at the thread, and it is correct.
I'm afraid all these warnings are quite fragile in the presence of more
aggressive optimizations, and I suspect it will only get worse.
4. libphobos/src/std/net/isemail.d
This is a D test where we don't actually fail, but we trigger the
verification code. It is the only jump threading edge that the new code
fails to get over the old code, and it only happens on ppc64.
It triggers because a BB4 -> BB5 is too expensive to thread, but a BBn
-> BB3 -> BB4 -> BB5 is considered safe to thread because BB3 is a latch
and it alters the profitability equation. The reason we don't get it,
is that we assume that if a X->Y is unprofitable, it is not worth
looking at W->X->Y and so forth.
Jeff had some fancy ideas on how to attack this. Once such idea was to
stop looking back, but only for things we were absolutely sure would
never yield a profitable path. I tried a subset of this, by allowing
further looks on this latch test, but my 1.55% overall performance
penalty turned into an 8.33% penalty. Personally it looks way too
expensive for this one isolated case. Besides, the test where this
clamping code originally came from still succeeds (commit
eab2541b860c48203115ac6dca3284e982015d2c).
CONCLUSION
==========
That's basically it.
If we agree the above things are not big issues, or can be addressed as
follow-ups, I'd like to start the ball rolling on the new threader.
This would allow more extensive testing of the code, and separate it a
bit from the other big changes coming up :).
Aldy