On 10/31/19 10:31 AM, Jeff Law wrote:
On 10/8/19 5:51 PM, Martin Sebor wrote:
Attached is a resubmission of the -Wrestrict implementation for
the sprintf family of functions.  The original patch was posted
in 2017 but never approved.  This revision makes only a few minor
changes to the original code, mostly necessitated by rebasing on
the top of trunk.

The description from the original posting still applies today:

   The enhancement works by first determining the base object (or
   pointer) from the destination of the sprintf call, the constant
   offset into the object (and subobject for struct members), and
   its size.  For each %s argument, it then computes the same data.
   If it determines that overlap between the two is possible it
   stores the data for the directive argument (including the size
   of the argument) for later processing.  After the whole call and
   format string have been processed, the code then iterates over
   the stored directives and their arguments and compares the size
   and length of the argument against the function's overall output.
   If they overlap it issues a warning.

The solution is pretty simple.  The only details that might be
worth calling out are the addition of a few utility functions some
of which at first glance look like they could be replaced by calls
to existing utilities:

  *  array_elt_at_offset
  *  field_at_offset
  *  get_origin_and_offset
  *  alias_offset
I'm a bit surprised we don't already have routines that perform these
functions.

I double-checked but couldn't find corresponding alternatives.
I might need these functions in other places in the future so
if/when I do I will move them somewhere more central (and
possibly also generalize them in the process).



Specifically, get_origin_and_offset looks like a dead ringer for
get_addr_base_and_unit_offset, except since the former is only
used for warnings it is less conservative.  It also works with
SSA_NAMEs.  This is also the function I expect to need to make
changes to (and fix bugs in).  The rest of the functions are
general utilities that could perhaps be moved to tree.c at some
point when there is a use for them elsewhere (I have some work
in progress that might need at least one of them).

Another likely question worth addressing is why the sprintf
overlap detection isn't handled in gimple-ssa-warn-restrict.c.
There is an opportunity for code sharing between the two "passes"
but it will require some fairly intrusive changes to the latter.
Those feel out of scope for the initial solution.

Finally, because of new dependencies between existing classes in
the file, some code had to be moved around within it a bit.  That
contributed to the size of the patch making the changes seem more
extensive than they really are.

Tested on x86_64-linux with Binutils/GDB and Glibc.

Martin

The original submission:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00036.html

gcc-83688.diff

PR middle-end/83688 - check if buffers may overlap when copying strings using 
sprintf

gcc/ChangeLog:

        PR middle-end/83688
        * gimple-ssa-sprintf.c (format_result::alias_info): New struct.
        (directive::argno): New member.
        (format_result::aliases, format_result::alias_count): New data members.
        (format_result::append_alias): New member function.
        (fmtresult::dst_offset): New data member.
        (pass_sprintf_length::call_info::dst_origin): New data member.
        (pass_sprintf_length::call_info::dst_field, dst_offset): Same.
        (char_type_p, array_elt_at_offset, field_at_offset): New functions.
        (get_origin_and_offset): Same.
        (format_string): Call it.
        (format_directive): Call append_alias and set directive argument
        number.
        (maybe_warn_overlap): New function.
        (pass_sprintf_length::compute_format_length): Call it.
        (pass_sprintf_length::handle_gimple_call): Initialize new members.
        * gcc/tree-ssa-strlen.c (): Also enable when -Wrestrict is on.

gcc/testsuite/ChangeLog:

        PR tree-optimization/35503
        * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index b11d7989d5e..b47ed019615 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c

+void
+format_result::append_alias (const directive &d, HOST_WIDE_INT off,
+                            const result_range &resrng)
+{
+  unsigned cnt = alias_count + 1;
+  alias_info *ar = XNEWVEC (alias_info, cnt);
So just a question.  Is there a particular reason why you're
self-managing the vector here?  Would it be easier to use the vec.h
capabilities?  I guess it's not that big of a deal since it's pretty
well isolated in here.

None of the vec templates is safely copyable or assignable (PR 90904)
so the code would be more cumbersome if it did use it than this way.
(For comparison, see class args_loc_t in gimple-ssa-isolate-paths.c
that uses vec).


@@ -2143,6 +2190,249 @@ format_character (const directive &dir, tree arg, const 
vr_values *vr_values)
    return res.adjust_for_width_or_precision (dir.width);
  }
+/* Return true if TYPE is one of the three narrow character types. */
+static inline bool
+char_type_p (tree type)
+{
+  return (type == char_type_node
+         || type == signed_char_type_node
+         || type == unsigned_char_type_node);
+}
Presumably you really want to restrict this to the internal notion of
char, signed char and unsigned char and not to any object of the right
size.  Just trying to confirm intent here.

Size is probably better/more general.


+/* For an expression X of pointer type, recursively try to find the same
+   origin (object or pointer) as Y it references and return such an X.
+   When X refers to a struct member, set *FLDOFF to the offset of the
+   member from the beginning of the "most derived" object.  */
+
+static tree
+get_origin_and_offset (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *off)
Might want a comment that this is for diagnostics only.  Presumably you
don't need to get the outer object, hence the different overall
structure for this routine relative to get_addr_base_and_unit_offset
@@ -2153,6 +2443,17 @@ format_string (const directive &dir, tree arg, const 
vr_values *vr_values)
  {
    fmtresult res;
+ if (warn_restrict)
+    {
+      /* See if ARG might alias the destination of the call with
+        DST_ORIGIN and DST_FIELD.  If so, store the starting offset
+        so that the overlap can be determined for certain later,
+        when the amount of output of the call (including subsequent
+        directives) has been computed.  Othewrwise, store HWI_MIN.  */
s/Othewrwise/Otherwise/


@@ -3482,6 +3826,136 @@ parse_directive (call_info &info,
    return dir.len;
  }
+/* Diagnose overlap between destination and %s directive arguments. */
+
+static void
+maybe_warn_overlap (call_info &info, format_result *res)
+{
+  /* Two vectors of 1-based indices corresponding to either certainly
+     or possibly aliasing arguments.  */
+  auto_vec<int, 16> aliasarg[2];
+
+  /* Go through the array of potentially aliasing directives and collect
+     argument numbers of those that do or may overlap the destination
+     object given the full result.  */
+  for (unsigned i = 0; i != res->alias_count; ++i)
+    {
+      const format_result::alias_info &alias = res->aliases[i];
+
+      enum { Possible = -1, None = 0, Certain = 1 } overlap = None;
GNU standards mandate upper case for enum constants.



+
+      /* If the precision is zero there is no overlap.  (This only
+        considers %s directives and ignores %n.)  */
+      if (alias.dir.prec[0] == 0 && alias.dir.prec[1] == 0)
+       continue;
+
+      if (alias.offset == HOST_WIDE_INT_MAX
+         || info.dst_offset == HOST_WIDE_INT_MAX)
+       overlap = Possible;
+      else if (alias.offset == info.dst_offset)
+       overlap = alias.dir.prec[0] == 0 ? Possible : Certain;
+      else
+       {
+         /* Dtermine overlap from the range of output and offsets
+            into the same destination as the source, and rule out
+            impossible overlap.  */
s/Dtermine/Determine/

I don't think the questions above really affect whether or not this
should go forward.  So OK with the nits fixed.

Committed in r278098.

Martin

Reply via email to