There is a functioning prototype in branch “ssa-range” which is a proof
of concept that the approach is functional as well as quick, and can be
used to answer questions which come up regarding what it can and can’t
do. Our last merge was on April 13th, so it's fairly up to date.
We have implemented a flexible range class (irange) which allows for
multiple subranges to represent a range, and which can be extended in
the future to types other than integral. We use this throughout, but it
could be replaced in the ranger with any similar API. Conversion
routines are also provided to convert from irange to value_range and
value_range to irange.
A full set of tree_code range-op routines are implemented. We have
commoned as much code as possible with the existing VRP range extraction
code. Also, we have added additional code for calculating the other
operands from a known result in numerous cases.
The code base in VRP has been modified (via a flag) to
- Work completely with the native value_range like it does today.
- Use irange and the range-ops component under the covers to
extract ranges. Requests in VRP are then converted from value_ranges to
irange, called into the range-op routines, and then converted back to
value_range for VRP/EVRP’s use.
- Do operations both ways and compare the results to make sure both
agree on the range, and trap if they do not.
The branch defaults to the compare and trap mode to ensure everything is
working correctly. This has been our correctness model for statement
range extraction and was active during the Fedora package builds. The
only time we disabled it was to do performance runs vs RVRP, and were
looking at both branch and trunk times for EVRP and VRP.
Of note, we drop all symbolics in ranges to VARYING on everything except
PLUS and MINUS, which we leave as native calculations if there are
symbolics present. More on symbolics later.
A VRP like pass called RVRP has been implemented.
- The vr-values statement simplification code has been factored out
to be range agnostic, meaning that these routines can operate on either
value_range or irange. Thus, we are using a common code base to perform
statement simplification as well.
- For complete compatibility with EVRP, the RVRP pass builds
dominators and instantiates the SCEV loop information so we have loop
range info available. RVRP does not need this info to run, but would
miss some of the test cases which depend on loop ranges.
- RVRP is set up to demonstrate it can process the IL in multiple
directions and bootstraps/passes all tests in all directions.
* Dominator order
* Post-dominator order
* BB1 thru BBn
* BBn thru BB1
* branch-only mode where only branches at the end of each BB
are examined for folding opportunities
4 additional passes have been converted to use the ranger model:
- sprintf - removed the dominator building/walking
- warn alloca - replaced calls to get global ranges with calls that
now return context sensitive ranges.
- warn restrict - just replaced EVRP range calls with ranger calls.
- backwards threader - enhanced to use contextual range information
to make additional threading decisions.
Symbolic Ranges
One big concern last year expressed was my decision to abolish symbolic
ranges.
I continue to maintain that there is no need to track the range of x_2
as [y_3 + 5, MAX] for x_2 = y_3 + 5. All you need to do is look at the
definition of x_2, and the same information is exposed right there in
the IL. If one requires the symbolic information, the same on-demand
process could lookup that information as well. This in turn, makes the
code for ranges much simpler, easier to maintain, and less likely to
introduce bugs.
We have found through our prototype that symbolics in ranges are not
nearly as prevalent as one would think. During the work to share a
common code base with VRP, we found that symbolic ranges are irrelevant
for everything except PLUS_EXPR and MINUS_EXPR. The shared code in our
branch drops symbolics to varying immediately for everything else, and
it has no impact on EVRP, or VRP or any tests we can find anywhere.
Furthermore, we never trapped while comparing ranges generated by VRP
versus generating them with range-ops which drops symbolics to varying.
We tried modifying VRP such that we don’t even create symbolic
endpoints, but rather revert to VARYING always. We can find no test
case that fails because a range is not calculated properly due to
resolving these endpoints.
There are a few that fail due to the symbolic being used to help track
relationals.. Ie
x_2 = y_3 + 5
If (x_2 > y_3) // can be folded since we know x_2 must be < y_3
VRP generates a range for x of [ y_3+5, MAX ] and at various points
uses that to infer a relational or equivalence. Ie, it becomes easy to
tell that the condition must always be true since the lower bound of the
range is y_3 + 5.
I argue this is not a range question, but rather a different problem
which VRP has chosen to solve by piggybacking on the range
representation. This leads to complications/complexity when trying to
evaluate ranges because they must constantly be on the lookout for
symbolics. This information is then carried around for the life of the
pass, even if it is never used. It also forces any
relational/equivalency queries to be handled within the context of the
VRP pass.
This aspect of symbolics would be handled by a relational/equivalence
processing engine that would be follow on work. Using the same basic
model as ranges, each tree code is taught to understand the relation
between its operands, and then we can answer equivalency and relational
accurately as well. It would be available for any pass to use
independent of ranges. I will expound upon that a bit in the future
directions section.