Re: irange best practices document

Martin Sebor via Gcc Wed, 09 Sep 2020 10:04:10 -0700

On 9/3/20 1:14 AM, Aldy Hernandez via Gcc wrote:

Below is a documented we have drafted to provide guidance on usingirange's and converting passes to it. This will future proof any suchpasses so that they will work with the ranger, or any other mechanismusing multiple sub-ranges (as opposed to the more limited value_range).
The official document will live here, but is included below for discussion:

     https://gcc.gnu.org/wiki/irange-best-practices

Feel free to respond to this post with any questions or comments.


Thanks for the writeup!

The biggest question on my mind at the moment is probably about
how to go about rewriting classes that maintain their own ranges
(most often as an array of two offset_int) to use one of the Ranger
classes.  Two examples are the access_ref class in builtins.h and
the builtin_memref class in gimple-ssa-warn-restrict.c.

Both of these compute the size of an object (as a simple range)
and the offset into it (also as a range).  In the future, they will
track of sizes of multiple objects (from PHI nodes).

My thinking is to do this in two steps: In 1) replace each
offset_int[2] member array with an instance of int_range<1> and
then rewrite all the code that manipulates the array with calls
to the ranger API. That should be fairly straightforward. In 2)
replace the simple int_range<1> with something more interesting
(int_range_max?) and rewrite the final code that processes
the results to scan all the subranges for overflow or overlap,
as well as the code that presents them in warnings/notes.  It
would be nice to have support for the formatting of ranges in
the pretty-printer to cut down on the repetitive tests that
determine how to format a constant (%E, vs a range [%E, %E],
vs a multi-range [%E, %E][%E, %E], ...[%E, %E]).

I suppose I/we should write this up as a case study after I/we
do the first rewrite.

Longer term, the size of an object (or objects) and an offset
into it/them seems like a generally useful property that would
be best associated with a pointer somehow, so that it could be
queried by an API similar to the one exposed by the irange class.
I imagine it would benefit both warnings and optimizations.

I do have one concern with a wholesale switch over to the ranger
classes and APIs.  Being designed with conservative assumptions
suitable for optimizers, the current APIs aren't necessarily
ideal for warnings,  A basic example is integer overflow and
wrapping.  In the general case, the optimizer must assume that
the range of the argument in 'malloc (n + 1)' might overflow (or
wrap).  A warning, though, should detect when it does and complain.
So I'd like to see a mode where the ranger evaluates expressions
in infinite precision.

Martin


Aldy & Andrew


INTRODUCTION
------------

irange is a class for storing and manipulating multi-ranges of
integers.  It is meant to be a replacement for value_range's, and can
currently inter-operate seamlessly with them.

Original value_range's can contain a range of integers, say from 10 to
20 inclusive, represented as [10, 20].  It also has a way of
representing an anti-range, which is the inverse of a range.  For
example, everything except 10 to 20 is represented as an anti-range of
~[10, 20].  This is really, shorthand for the union of
[MIN_INT, 9] U [21, MAX_INT].

The value_range representation has the limitation that higher
granularity is not representable without losing precision.  For
example, you cannot specify the range of [10, 15] U [20, 30], instead
it is represented ambiguously with with a range of [10, 30].

On the other hand, multi-ranges with the irange class, can represent
an arbitrary number of sub-ranges.  More formally, multi-ranges have
0 or more non-intersecting sub-ranges with integral bounds.  For
example, you can specify a range containing the numbers of [10, 15]
and [20, 30] with an irange of [10, 15][20, 30].  With irange, instead
of using anti-ranges for ~[10, 20] the underlying number of ranges can
be represented accurately with [MIN_INT, 9][21, MAX_INT].

Multi-ranges are not limited to 1 or 2 sub-ranges.  Instead, you can
specify any number of sub-ranges.  For example:

          int_range<5> five_pairs;
          int_range<2> two_pairs;
          int_range<1> legacy_value_range;
          widest_irange huge_irange;  // currently 255 sub-ranges.

The special case of int_range<1> provides legacy support forvalue_range. Currently, value_range is just a typedef for int_range<1>.


The specially named widest_irange is used for "unlimited" sub-ranges,
and is meant to be used when calculating intermediate results.
Currently it is a large number (255), but could be changed without
prior notice.  Note that "widest" does not have anything to do with
the range of the underlying integers, but the maximum amount of
sub-range pairs available for calculation.

Here are some examples of calculations with different sub-range
granularity:

     // Assume:
     tree n10 = build_int_cst (integer_type_node, 10);
     tree n20 = build_int_cst (integer_type_node, 20);
     tree n30 = build_int_cst (integer_type_node, 30);
     tree n40 = build_int_cst (integer_type_node, 40);
     tree n50 = build_int_cst (integer_type_node, 50);
     tree n60 = build_int_cst (integer_type_node, 60);

int_range<1> (aka value_range):

     int_range<1> r1 (n10, n20);
     int_range<1> r2 (n30, n40);
     r1.union_ (r2);
     r1.dump ();

     // Outputs: [10, 40]

     // Note: Since result doesn't fit in an int_range<1>, it is
     // conservatively truncated.

int_range<2>:

     int_range<2> r1 (n10, n20);
     int_range<2> r2 (n30, n40);
     r1.union_ (r2);
     r1.dump ();

     // Outputs: [10, 20][30, 40]

     int_range<2> r3 (n50, n60);
     r1.union_ (r3);
     r1.dump ();

     // Outputs: [10, 20][30, 60]

     // Note: Unioning [10, 20][30, 40] and [50, 60] doesn't fit
     // into an int_range<2>, so it is truncated.

     int_range<1> small;
     small = r1;  // r1 is [10,20][30,60] as above.
     small.dump ();

     // Outputs: [10, 60]

     // Note: Copying a larger range into a smaller range will
     // result in truncating the result.

widest_irange:

     widest_irange r;
     for (i = 0; i <= 40; i += 10) {
         tree low = build_int_cst (integer_type_node, i);
         tree high = build_int_cst (integer_type_node, i + 1);
         int_range<2> tmp (low, high);
         r.union_ (tmp);
     }
     r.dump ();

     // Outputs: [0, 1][10, 11][20, 21][30, 31][40, 41]

WRITE YOUR CODE FOR INFINITE SUB-RANGES
---------------------------------------

Your code should be agnostic to the sub-ranges there-in.  For example,
if you wanted to check if the numbers 5 through 10 are in a range R,
avoid looking at min/max/kind and having to special case VR_RANGE,
VR_ANTI_RANGE, and VR_VARYING.  Instead do something like this:

widest_irange my_range (build_int_cst (5, integer_type_node),
                       build_int_cst (10, integer_type_node));
my_range.intersect (R);
if (R == my_range)
     return goodness;

If on the other hand, if you want to check if there is ANY of thenumbers from 5 through 10 are in the range, you could do:


my_range.intersect (R);
if (!R.undefined_p ())
     return goodness;

First, don't be afraid to build throw-away ranges. They're cheap,light-weight, and don't require memory allocations.


Also, you could also have declared my_range as containing a max of 3
sub-ranges with:

     int_range<3> my_range;

However, the result of the intersect code above will be limited to 3
sub-ranges.  Sometimes 3 is all you need, but we suggest that

intermediate results be done in widest_irange (currently 255 sub-ranges)and then the result be copied to whatever range you wish to return.


For example:

void some_calculation(irange &result, const irange &a, const irange &b)
{
     widest_irange intermediate = a;
     do_stuff (intermediate, b);
     result.intersect (intermediate);
}


Note that operations on ranges are done in the granularity of the
range itself, not its argument.  For example, BIG.intersect(SMALL) is
done in the granularity of BIG.  On the other hand, if SMALL is an
int_range<1> and BIG is an int_range<5>, SMALL.intersect(BIG) will be
squished down to the granularity of SMALL.

Example of iterating through sub-ranges in a range:

     if (r.undefined_p ())
         return;
     for (unsigned i = 0; i < r.num_pairs (); ++i) {
         wide_int x = r.lower_bound (i);
         wide_int y = r.upper_bound (i);
         // do stuff with x/y
     }

     // Note: undefined_p() is syntactic sugar for num_pairs() == 0.

FUNCTION SHOULD TAKE IN IRANGE'S AND STICK TO THE API
-----------------------------------------------------

All functions taking in value_range's should be rewritten to take in
an irange instead.  Taking in the irange base class guarantees your
code works with any range.

You should keep to the main API which is listed in value-range.h,
while avoiding the methods marked with DEPRECATED, especially kind (),
min (), max (), symbolic_p (), which are all slated to be removed.

For example, symbolic_p should be avoided, because 9 out of 10 times
passes don't need symbolics.  Exceptions to these are the VRP twins
(evrp / VRP) which need it internally.  But virtually all passes that
currently use the evrp_range_analyzer, never do anything with a
symbolic range returned by get_value_range.  Also, get_range_info()
returns global range information, and does not use symbolics.

The API normalizes symbolics into integers, so you can use it and
assume the range contains only numbers.  For example, a symbolic of
[5,X] is normalized into [5,MAX] when accessed through the API.  That
is, the lower bound of the range is 5, and the upper bound is
TYPE_MAX_VALUE for the type.  Similarly, a symbolic of [A, B] is
normalized to [MIN,MAX].

If your functions take in an irange and limit themselves to the
non-deprecated API, your code will be guaranteed to work with either
one or an infinite amount of sub-ranges when available.

Note that if your pass needs symbolics and you're not evrp/vrp
(really, you don't), your ability to make use of multi-ranges is
limited to operations involving 2 constant ranges, and most operations
will be truncated down to a value_range, thus defeating the purpose.
If your pass is like 90% of passes, just stick to the API, use
iranges, and you'll be fine.

Keep in mind that passing a multi-range into the EVRP eco-system won't
buy you anything, as it is written entirely with value_range's.  You
need to pass a multi-range to the ranger, or range-ops to get anything
with multi-ranges back.

ANTI-RANGES
-----------

Anti ranges don't really exist.  Although they currently exist in the
internal representation of the legacy code, you shouldn't be checking
for them, and you shouldn't be treating them specially.  You should be
asking for the bounds of a range or sub-range, not for anti-range in
particular.

For example, ~[5,10] is really [0..4][11,MAX], and should be treated
as such.

Cumbersome operations that were previous done with anti-ranges are
much easier now.  For example, previously if you were to write a
predicate to check if X is in range R you would have to:

check that R is not undefined_p()
check that R is not varying_p()
check that R is either a VR_RANGE or a VR_ANTI_RANGE
check that R is a constant_p (that is, is not a symbolic)
check that R.kind()==VR_RANGE && X >= R.min() && X <= R.max()
or....that R.kind()==VR_ANTI_RANGE && X < R.min() && X > R.max()

With the new API you check membership with:

R.contains_p(X)

VARYING AND UNDEFINED RANGES
----------------------------

The old VARYING nomenclature is now just syntactic sugar for a range of[MIN,MAX]. You can safely assume that any range (with the exception ofan undefined range) has bounds. So the lower_bound() of a range thathas a varying is the minimum value for the type. You shouldn't have tospecial case handling varying ranges. They're just ranges that span theentire domain of the underlying type.


Undefined ranges are syntactic sugar for the empty set (that is
R.num_pairs() == 0).  Since you obviously can't ask for the bounds of
an undefined_p() range, you may have to check that a range is not the
empty set first before asking for its bounds.

RETURNING RANGE RESULTS FROM A FUNCTION
---------------------------------------

Note, that functions that return ranges should do so by passing a

reference (or pointer, though we prefer references). The reason is thatyou want the caller to decide what sub-range granularity they desire.Otherwise you end up returning by value an int_range<5>, when all theuser wanted was an int_range<2>. Or worse, you truncate the resultswhen the user wants a higher precision.

There are various setters that are quicker than full-on copying. Youshould use them when possible. For example,


instead of:
     result = int_range<1> (build_zero_cst (integer_type_node),
                    build_zero_cst (integer_type_node));

do:
     result.set_zero (integer_type_node);

instead of:
     result = int_range<1> (a, b);

do:
     result.set (a, b);

VALUE_RANGE'S ARE NOT EVIL, BUT ARE SEVERELY LIMITED
----------------------------------------------------

Note that int_range<1> and value_range are the same thing.  In fact,
value_range is typedef'd to be an int_range<1>.  However, we prefer
int_range<1>, as value_range is deprecated because it implies legacy
support.  Do not that your functions should take in irange's, not
int_range<> or value_range, as this this ensures that it will work
with any passed range.

PERSISTENT IRANGES
------------------

If you wish to persistently keep iranges beyond the scope of your
function, we will provide an irange_pool class when the ranger is
contributed.  It will provide a mechanism for dynamically allocating
ranges.  More details will follow, but the general gist is:

     irange_pool pool;

     // Allocate an irange of 5 sub-ranges.
     irange *p = pool.allocate (5);

     // Allocate an irange of 3 sub-ranges.
     irange *q = pool.allocate (3);

     // Allocate an irange with as many sub-ranges as are currently
     // used in "some_other_range".
     irange *r = pool.allocate (some_other_range);

In the above snippets, the "pool" object will deallocate any
ranges at destruction.

Re: irange best practices document

Reply via email to