On 9/9/20 7:03 PM, Martin Sebor wrote:
On 9/3/20 1:14 AM, Aldy Hernandez via Gcc wrote:
Below is a documented we have drafted to provide guidance on using
irange's and converting passes to it. This will future proof any such
passes so that they will work with the ranger, or any other mechanism
using multiple sub-ranges (as opposed to the more limited value_range).
The official document will live here, but is included below for
discussion:
https://gcc.gnu.org/wiki/irange-best-practices
Feel free to respond to this post with any questions or comments.
Thanks for the writeup!
The biggest question on my mind at the moment is probably about
how to go about rewriting classes that maintain their own ranges
(most often as an array of two offset_int) to use one of the Ranger
classes. Two examples are the access_ref class in builtins.h and
the builtin_memref class in gimple-ssa-warn-restrict.c.
Both of these compute the size of an object (as a simple range)
and the offset into it (also as a range). In the future, they will
track of sizes of multiple objects (from PHI nodes).
My thinking is to do this in two steps: In 1) replace each
offset_int[2] member array with an instance of int_range<1> and
then rewrite all the code that manipulates the array with calls
to the ranger API. That should be fairly straightforward. In 2)
This sounds like a sound approach. Instead of passing pairs of
integers, pass an irange and limit yourself to the non-deprecated part
of the API.
replace the simple int_range<1> with something more interesting
(int_range_max?) and rewrite the final code that processes
The ranger will use whatever sized range you pass it. So if you want
fine granularity, pass it an int_range_max.
However, if your final destination in memory is constrained (you're
storing a gazillion of these), you can use the irange_pool to allocate
the minimum amount of storage needed. Note, that the irange_pool has
not been contributed yet. It'll come with the ranger.
Taking the strlen code as an example, you could transform the following:
vr_values *v = CONST_CAST (vr_values *, rvals);
const value_range_equiv *vr = v->get_value_range (si->nonzero_chars);
if (vr->kind () != VR_RANGE || !range_int_cst_p (vr))
return false;
...into:
int_range_max r;
if (!ranger->range_of_expr (r, si->nonzero_chars, si->stmt))
return false;
Where ranger is declared somewhere at the top of your pass. Perhaps in
printf_strlen_execute() and pass it around like you do rvals.
Note that for better precision, you should pass the gimple context.
That is, the statement from which si->nonzero_chars came from. In the
above case, I think you want si->stmt. Currently get_value_range() has
a STMT parameter which is unused. This was the reason for providing
such parameter.
Finally... you can store R into an int_range<XX> depending on how much
space you want to use in your ultimate storage location. The assignment
operator will squish R into whatever space you give it. So the
following will truncate things appropriately:
int_range<3> final_location = r;
However, if you're using irange_pool to allocate just the amount of
space needed, you could do:
// Presumably declared wherever you declared the ranger.
irange_pool pool;
...
...
irange *p = pool.allocate (r);
p will hold a pointer to a range that contains R with no extra space.
This is ideal for longer term storage. For intermediate results, use
int_range_max.
Note that I am already converting sprintf/strlen to use the ranger
instead of evrp/vr_values (aldyh/ranger-staging git branch). So the
work of converting to range_of_expr above, and passing around gimple
context, will already be done with the contribution of ranger. What I
will not provide for now, is the wholesale conversion of other passes to
the "clean" irange API. So if your pass is still using kind() and
stashing away pairs of offset_int's, that will have to be done
separately. I can offer guidance though, and help as time permits.
the results to scan all the subranges for overflow or overlap,
as well as the code that presents them in warnings/notes. It
would be nice to have support for the formatting of ranges in
the pretty-printer to cut down on the repetitive tests that
determine how to format a constant (%E, vs a range [%E, %E],
vs a multi-range [%E, %E][%E, %E], ...[%E, %E]).
I'm not a big fan of including ranges in warnings/errors. The ranger
and the vrp twins, are bound to change over releases with finer and
finer ranges. Having our error messages depend on ranges that may or
may not change can be confusing for users, and it means we must change
regression tests to match a changing compiler at each release.
I would much rather see a generic warning of "P may be out of bounds"
than "P may be [X,Y][Z,A][C,D]". That's just confusing :). Also, note
that the ranger can come up with ranges that are completely non-obvious,
as they pass through PHIs, back edges, and a combination of operations.
This will only add to the user confusion.
I suppose I/we should write this up as a case study after I/we
do the first rewrite.
Longer term, the size of an object (or objects) and an offset
into it/them seems like a generally useful property that would
be best associated with a pointer somehow, so that it could be
queried by an API similar to the one exposed by the irange class.
I imagine it would benefit both warnings and optimizations.
I do have one concern with a wholesale switch over to the ranger
classes and APIs. Being designed with conservative assumptions
suitable for optimizers, the current APIs aren't necessarily
ideal for warnings, A basic example is integer overflow and
wrapping. In the general case, the optimizer must assume that
the range of the argument in 'malloc (n + 1)' might overflow (or
wrap). A warning, though, should detect when it does and complain.
So I'd like to see a mode where the ranger evaluates expressions
in infinite precision.
The ranger will evaluate an expression in the mode of the trees
there-in, nothing more, nothing less. If your range has an underlying
type that wraps, the result will also wrap.
If you want to perform an operation ignoring such constraints, you
probably want to cast to a larger type before doing the operation:
range_cast (range_n, SOME_LARGER_INT_TYPE);
range_cast (range_1, SOME_LARGER_INT_TYPE);
int_range_max result;
op.plus.fold_range (result, SOME_LARGER_INT_TYPE, range_n, range_1);
But in the case of malloc(n+1) above, the argument sits in an SSA, and
the range plus operation was done in the underlying type. There's no
way of changing that. The ranger won't lie to you. If the operation
may wrap or overflow, it will show up as so in the range. I don't think
we have any plans of changing this behavior.
Was that your question?
Aldy