On 11/5/20 8:08 PM, Andrew MacLeod wrote:
On 11/5/20 7:50 PM, Martin Sebor wrote:
On 11/5/20 5:02 PM, Andrew MacLeod wrote:
On 11/5/20 4:02 PM, Martin Sebor wrote:
On 11/5/20 12:29 PM, Martin Sebor wrote:
On 10/1/20 11:25 AM, Martin Sebor wrote:
On 10/1/20 9:34 AM, Aldy Hernandez wrote:
On 10/1/20 3:22 PM, Andrew MacLeod wrote:
> On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>> Thanks for doing all this! There isn't anything I don't
understand
>>> in the sprintf changes so no questions from me (well, almost
none).
>>> Just some comments:
>> Thanks for your comments on the sprintf/strlen API conversion.
>>
>>> The current call statement is available in all functions
that take
>>> a directive argument, as dir->info.callstmt. There should
be no need
>>> to also add it as a new argument to the functions that now
need it.
>> Fixed.
>>
>>> The change adds code along these lines in a bunch of places:
>>>
>>> + value_range vr;
>>> + if (!query->range_of_expr (vr, arg, stmt))
>>> + vr.set_varying (TREE_TYPE (arg));
>>>
>>> I thought under the new Ranger APIs when a range couldn't be
>>> determined it would be automatically set to the maximum for
>>> the type. I like that and have been moving in that direction
>>> with my code myself (rather than having an API fail, have it
>>> set the max range and succeed).
>> I went through all the above idioms and noticed all are being
used on
>> supported types (integers or pointers). So range_of_expr
will always
>> return true. I've removed the if() and the set_varying.
>>
>>> Since that isn't so in this case, I think it would still be
nice
>>> if the added code could be written as if the range were set to
>>> varying in this case and (ideally) reduced to just
initialization:
>>>
>>> value_range vr = some-function (query, stmt, arg);
>>>
>>> some-function could be an inline helper defined just for the
sprintf
>>> pass (and maybe also strlen which also seems to use the same
pattern),
>>> or it could be a value_range AKA irange ctor, or it could be
a member
>>> of range_query, whatever is the most appropriate.
>>>
>>> (If assigning/copying a value_range is thought to be too
expensive,
>>> declaring it first and then passing it to that helper to set it
>>> would work too).
>>>
>>> In strlen, is the removed comment no longer relevant? (I.e.,
does
>>> the ranger solve the problem?)
>>>
>>> - /* The range below may be "inaccurate" if a constant
has been
>>> - substituted earlier for VAL by this pass that
hasn't been
>>> - propagated through the CFG. This shoud be fixed by
the new
>>> - on-demand VRP if/when it becomes available
(hopefully in
>>> - GCC 11). */
>> It should.
>>
>>> I'm wondering about the comment added to
get_range_strlen_dynamic
>>> and other places:
>>>
>>> + // FIXME: Use range_query instead of global ranges.
>>>
>>> Is that something you're planning to do in a followup or should
>>> I remember to do it at some point?
>> I'm not planning on doing it. It's just a reminder that it
would be
>> beneficial to do so.
>>
>>> Otherwise I have no concern with the changes.
>> It's not cleared whether Andrew approved all 3 parts of the
patchset
>> or just the valuation part. I'll wait for his nod before
committing
>> this chunk.
>>
>> Aldy
>>
> I have no issue with it, so OK.
Pushed all 3 patches.
>
> Just an observation that should be pointed out, I believe Aldy
has all
> the code for converting to a ranger, but we have not pursued
that any
> further yet since there is a regression due to our lack of
equivalence
> processing I think? That should be resolved in the coming
month, but at
> the moment is a holdback/concern for converting these
passes... iirc.
Yes. Martin, the take away here is that the strlen/sprintf pass
has been converted to the new API, but ranger is still not up and
running on it (even on the branch).
With the new API, all you have to do is remove all instances of
evrp_range_analyzer and replace them with a ranger. That's it.
Below is an untested patch that would convert you to a ranger
once it's contributed.
IIRC when I enabled the ranger for your pass a while back, there
was one or two regressions due to missing equivalences, and the
rest were because the tests were expecting an actual specific
range, and the ranger returned a slightly different/better one.
You'll need to adjust your tests.
Ack. I'll be on the lookout for the ranger commit (if you hppen
to remember and CC me on it just in case I might miss it that would
be great).
I have applied the patch and ran some tests. There are quite
a few failures (see the list below). I have only looked at
a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
boils down to the following test case. There should be no warning
for either sprintf call. The one in h() is a false positive and
the reason for at least some of the regressions. Somehow,
the conversions between int and char are causing Ranger to lose
the range.
$ cat t.c && gcc -O2 -S -Wall t.c
char a[2];
extern int x;
signed char f (int min, int max)
{
signed char i = x;
return i < min || max < i ? min : i;
}
void ff (signed char i)
{
__builtin_sprintf (a, "%i", f (0, 9)); // okay
}
signed char g (signed char min, signed char max)
{
signed char i = x;
return i < min || max < i ? min : i;
}
void gg (void)
{
__builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
}
t.c: In function ‘gg’:
t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes
into a region of size 2 [-Wformat-overflow=]
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~
t.c:24:25: note: directive argument in the range [-128, 127]
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~~~
t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes
into a destination of size 2
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another failure (the one in builtin-sprintf-warn-22.c) is this where
the false positive is due to the strlen pass somehow having lost
the upper bound on the sum of the two string lengths.
I'm guessing this might have something to do with the strlen pass
calling set_range_info() on the nonconstant results of strlen calls
it can determine the range for. How would I go about doing it with
ranger? I see ranger_cache::set_global_range() and the class ctor
takes a gimple_ranger as argument. Would calling the function be
the right way to do it? (I couldn't find anything on the Wiki.)
I haven't had time to write a new modern wiki yet... probably won't
until well after stage 1 closes either.
And no, thats the internal cache of the ranger, which you dont have
access to.
At the moment, there isnt a way to inject a "new" global range out of
the blue. It hasnt been a requirement before, but wouldn't be
particularly difficult.
Short answer, there is no current equivalent of set_range_info()
because we dont have a persistent global cache in the same way, and
the ranger doesnt set the RANGE_INFO field at this point.. we havent
gotten to the point of reworking that yet.
So the question is, what exactly are you trying to do? I presume you
are analyzing a strlen statement, and are making range conclusions
about either the result or some of the operands are that are not
otherwise exposed in the code?
And you are looking to have those values injected into the rangers
calculations as refinements?
Yes, exactly. This is used both for optimization and for warnings
(both to enable them and to suppress false positives).
Besides folding strlen and sprintf calls into constants, the strlen
pass sets the range of the results of the calls when the lengths of
the arguments/output are known to be at least some number of bytes.
Besides exposing optimization opportunities downstream, the pass
uses this information to either issue or suppress warnings like in
the sprintf case above. -Wformat-overflow issues warnings either
based on string lengths (or ranges) if they're known, or based on
the sizes of the arrays the strings are stored in when nothing is
known about the lengths. The array size is used as the worst case
estimate, which can cause false positives.
Here's an example of a downstream optimization made possible by
this. Because i's range is known, the range of the snprintf result
is also known (the range is computed for all snprintf arguments).
The strlen pass sets the range and subsequent passes eliminate
the unreachable code.
void f (int i)
{
if (i < 100 || 9999 < i)
i = 100;
int n = __builtin_snprintf (0, 0, "%i", i);
if (n < 3 || 4 < n)
__builtin_abort ();
}
Martin
That seems like somethings that the range code for builtins should take
care of rather than be calculated externally and injected
they are just like any other builtins that if the range of certain
arguments are known, you can produce a result?
The strlen optimization is more involved in that it keeps track of
the state of the strings as they're being created by calls to string
functions like strcpy (or single and multi-byte stores). So the only
way for Ranger to figure out the range of the result of a strlen call
is to query the strlen pass. That might work via a callback but it
would of course only work while the strlen pass is running. The same
goes for the sprintf results (which are based on the strlen data).
It might be good enough as long as the ranges can also be exported
to downstream passes.
It might be possible to compute this on demand too (and introduce
something like a strlen_range_query) but it would need reworking
the strlen pass. I like the on-demand design but in the strlen
case I'm not sure the benefits would justify the costs.
I gather the range of i
being known to be [100,9999] means we know something about n. The the
ranger would be able to do that anywhere and everywhere, not just within
this pass. One of the ranger goals is to centralize all range tracking
and the warning pass can continue to check the ranges and issue warnings
when the ranges arent what it likes.
In this simple case, yes. But consider a more involved example:
char a[8];
void f (int i)
{
if (i < 100 || 9999 < i) i = 100;
memcpy (a, "123", 3); // strlen(a) is in [3, 7]
int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12]
if (n < 7 || 12 < n) // folded to false
abort ();
}
To compute n's range you need to know not only that of i but also
the length of a (or its range in this case). That's what the strlen
optimization enables.
There may be other hoops you have to continue to jump thru, i don't know
the specifics of what you deal with, but at least the basics of ranges
produced from a builtin sounds like something that should be internal
to the range engine.
It might help to meet to help each other come up to speed on what
we're doing, what we're planning, and how to best integrate the work
and enable future enhancements. I still know very little about
the Ranger internals and it might be helpful to you to learn more
about some of the warnings I work on and the optimizations that
enable them (and what I'd like to do in the future).
Martin