On 10/8/21 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.
Thanks for doing this. I know I said I'd work on it but I'm still
bogged down in my stage 1 work that's not going so great :( I just
have a few minor comments/questions on the strlen change (inline)
but I am a bit concerned about the test failure.
No additional cleanups have been done. For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range. Fixing this
could further improve these passes.
As a further enhancement, if the relevant maintainers deem useful,
the domwalk could be removed from strlen. That is, unless the pass
needs it for something else.
With ranger we are now able to remove the range calculation from
before_dom_children entirely. Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code. I have XFAILed the test and documented what the problem is.
builtin-sprintf-warn-22.c is a regression test for a false positive
in Glibc. If it fails we'll have to deal with the Glibc failure
again, which I would rather avoid. Have you checked to see if
Glibc is affected by the change?
It looks like the same problem in the sprintf test triggers a false
positive in gimple-ssa-warn-access.cc so I have added
-Wno-format-overflow until it can be fixed.
I can expand on the false positive if necessary, but the gist is that
this:
_17 = strlen (_132);
_18 = strlen (_136);
_19 = _18 + _17;
if (_19 > 75)
goto <bb 59>; [0.00%]
else
goto <bb 61>; [100.00%]
...dominates the sprintf in BB61. This means that ranger can figure
out that the _17 and _18 are [0, 75]. On the other hand, evrp
returned a range of [0, 9223372036854775805] which presumably the
sprintf code was ignoring as a false positive here:
This is a feature designed to avoid false positives when the sprintf
pass doesn't know anything about the strings (i.e., their lengths
are unconstrained by either the sizes of the arrays they're stored
in or any expressions like asserts involving their lengths).
It sounds like the strlen/ranger improvement partially propagates
constraints from subsequent expressions into the strlen results
but it doesn't go far enough for them to actually fully satisfy
the constraint, which is what in turn triggers the warning.
I.e., in the test:
void g (char *s1, char *s2)
{
char b[1025];
size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
if (n + d + 1 >= 1025)
return;
sprintf (b, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" }
the range of n and d is [0, INF] and so the sprintf call doesn't
trigger a warning. With your change, because their range is
[0, 1023] each (and there's no way to express that their sum
is less than 1025), the warning triggers because it considers
the worst case scenario (the upper bounds of both).
char sizstr[80];
...
...
char *s1 = print_generic_expr_to_str (sizrng[1]);
gcc_checking_assert (strlen (s0) + strlen (s1)
< sizeof sizstr - 4);
sprintf (sizstr, "[%s, %s]", s0, s1);
The warning triggers with:
gimple-ssa-warn-access.cc: In member function ‘void
{anonymous}::pass_waccess::maybe_check_access_sizes(rdwr_map*, tree, tree,
gimple*)’:
gimple-ssa-warn-access.cc:2916:32: warning: ‘%s’ directive writing up to 75
bytes into a region of size between 2 and 77 [-Wformat-overflow=]
2916 | sprintf (sizstr, "[%s, %s]", s0, s1);
| ^~~~~~~~~~
gimple-ssa-warn-access.cc:2916:23: note: ‘sprintf’ output between 5 and 155
bytes into a destination of size 80
2916 | sprintf (sizstr, "[%s, %s]", s0, s1);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Yes, that does look like the same problem. It's a side-effect
of the checking_assert. What's troubling is that it's one that
has exactly the opposite effect of what's intended: it causes
warnings when it's intended to avoid them, which was the main
goal of the strlen/sprintf integration.
Suppressing the warning in these cases, while technically simple,
would be a design change. We might just have to live with this.
The asserts still work to constrain individual lenghts, they just
won't work for more complex expressions involving relationships
between two or more strings.
On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.
That's good to hear! :)
Bootstrap and regtested on x86-64 Linux. I also ran it through our
callgrind harness and there was no overall change in overall
compilation time.
OK?
...
@@ -269,7 +270,7 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT
off,
return -1;
value_range vr;
- if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
+ if (!rvals->range_of_expr (vr, si->nonzero_chars, stmt))
We need stmt rather than si->stmt because the latter may be different
or null when the former will always refer to the current statement,
correct?
return -1;
value_range_kind rng = vr.kind ();
if (rng != VR_RANGE)
@@ -324,7 +325,8 @@ get_next_strinfo (strinfo *si)
information. */
static int
-get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out,
+get_addr_stridx (tree exp, gimple *stmt,
+ tree ptr, unsigned HOST_WIDE_INT *offset_out,
range_query *rvals = NULL)
I think there still are quite a few calls to get_addr_stridx() and
get_addr() that don't pass in rvals. I started doing this back in
the GCC 11 cycle but didn't finish the job. Eventually, rvals
should be passed to all their callers (or they made class members
with a ranger member). I mention this in case it has an effect on
the range propagation (since the pass also sets ranges).
{
HOST_WIDE_INT off;
...
@@ -4653,7 +4662,7 @@ count_nonzero_bytes_addr (tree exp, unsigned
HOST_WIDE_INT offset,
&& TREE_CODE (si->nonzero_chars) == SSA_NAME)
{
value_range vr;
- rvals->range_of_expr (vr, si->nonzero_chars, si->stmt);
+ rvals->range_of_expr (vr, si->nonzero_chars, stmt);
Same question about si->stmt. (Just making sure I understand.)
Thanks
Martin