On 10/8/19 5:51 PM, Martin Sebor wrote: > Attached is a resubmission of the -Wrestrict implementation for > the sprintf family of functions. The original patch was posted > in 2017 but never approved. This revision makes only a few minor > changes to the original code, mostly necessitated by rebasing on > the top of trunk. > > The description from the original posting still applies today: > > The enhancement works by first determining the base object (or > pointer) from the destination of the sprintf call, the constant > offset into the object (and subobject for struct members), and > its size. For each %s argument, it then computes the same data. > If it determines that overlap between the two is possible it > stores the data for the directive argument (including the size > of the argument) for later processing. After the whole call and > format string have been processed, the code then iterates over > the stored directives and their arguments and compares the size > and length of the argument against the function's overall output. > If they overlap it issues a warning. > > The solution is pretty simple. The only details that might be > worth calling out are the addition of a few utility functions some > of which at first glance look like they could be replaced by calls > to existing utilities: > > * array_elt_at_offset > * field_at_offset > * get_origin_and_offset > * alias_offset I'm a bit surprised we don't already have routines that perform these functions.
> > Specifically, get_origin_and_offset looks like a dead ringer for > get_addr_base_and_unit_offset, except since the former is only > used for warnings it is less conservative. It also works with > SSA_NAMEs. This is also the function I expect to need to make > changes to (and fix bugs in). The rest of the functions are > general utilities that could perhaps be moved to tree.c at some > point when there is a use for them elsewhere (I have some work > in progress that might need at least one of them). > > Another likely question worth addressing is why the sprintf > overlap detection isn't handled in gimple-ssa-warn-restrict.c. > There is an opportunity for code sharing between the two "passes" > but it will require some fairly intrusive changes to the latter. > Those feel out of scope for the initial solution. > > Finally, because of new dependencies between existing classes in > the file, some code had to be moved around within it a bit. That > contributed to the size of the patch making the changes seem more > extensive than they really are. > > Tested on x86_64-linux with Binutils/GDB and Glibc. > > Martin > > The original submission: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00036.html > > gcc-83688.diff > > PR middle-end/83688 - check if buffers may overlap when copying strings using > sprintf > > gcc/ChangeLog: > > PR middle-end/83688 > * gimple-ssa-sprintf.c (format_result::alias_info): New struct. > (directive::argno): New member. > (format_result::aliases, format_result::alias_count): New data members. > (format_result::append_alias): New member function. > (fmtresult::dst_offset): New data member. > (pass_sprintf_length::call_info::dst_origin): New data member. > (pass_sprintf_length::call_info::dst_field, dst_offset): Same. > (char_type_p, array_elt_at_offset, field_at_offset): New functions. > (get_origin_and_offset): Same. > (format_string): Call it. > (format_directive): Call append_alias and set directive argument > number. > (maybe_warn_overlap): New function. > (pass_sprintf_length::compute_format_length): Call it. > (pass_sprintf_length::handle_gimple_call): Initialize new members. > * gcc/tree-ssa-strlen.c (): Also enable when -Wrestrict is on. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/35503 > * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: New test. > > diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c > index b11d7989d5e..b47ed019615 100644 > --- a/gcc/gimple-ssa-sprintf.c > +++ b/gcc/gimple-ssa-sprintf.c > +void > +format_result::append_alias (const directive &d, HOST_WIDE_INT off, > + const result_range &resrng) > +{ > + unsigned cnt = alias_count + 1; > + alias_info *ar = XNEWVEC (alias_info, cnt); So just a question. Is there a particular reason why you're self-managing the vector here? Would it be easier to use the vec.h capabilities? I guess it's not that big of a deal since it's pretty well isolated in here. > @@ -2143,6 +2190,249 @@ format_character (const directive &dir, tree arg, > const vr_values *vr_values) > return res.adjust_for_width_or_precision (dir.width); > } > > +/* Return true if TYPE is one of the three narrow character types. */ > +static inline bool > +char_type_p (tree type) > +{ > + return (type == char_type_node > + || type == signed_char_type_node > + || type == unsigned_char_type_node); > +} Presumably you really want to restrict this to the internal notion of char, signed char and unsigned char and not to any object of the right size. Just trying to confirm intent here. + > +/* For an expression X of pointer type, recursively try to find the same > + origin (object or pointer) as Y it references and return such an X. > + When X refers to a struct member, set *FLDOFF to the offset of the > + member from the beginning of the "most derived" object. */ > + > +static tree > +get_origin_and_offset (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *off) Might want a comment that this is for diagnostics only. Presumably you don't need to get the outer object, hence the different overall structure for this routine relative to get_addr_base_and_unit_offset > @@ -2153,6 +2443,17 @@ format_string (const directive &dir, tree arg, const > vr_values *vr_values) > { > fmtresult res; > > + if (warn_restrict) > + { > + /* See if ARG might alias the destination of the call with > + DST_ORIGIN and DST_FIELD. If so, store the starting offset > + so that the overlap can be determined for certain later, > + when the amount of output of the call (including subsequent > + directives) has been computed. Othewrwise, store HWI_MIN. */ s/Othewrwise/Otherwise/ > @@ -3482,6 +3826,136 @@ parse_directive (call_info &info, > return dir.len; > } > > +/* Diagnose overlap between destination and %s directive arguments. */ > + > +static void > +maybe_warn_overlap (call_info &info, format_result *res) > +{ > + /* Two vectors of 1-based indices corresponding to either certainly > + or possibly aliasing arguments. */ > + auto_vec<int, 16> aliasarg[2]; > + > + /* Go through the array of potentially aliasing directives and collect > + argument numbers of those that do or may overlap the destination > + object given the full result. */ > + for (unsigned i = 0; i != res->alias_count; ++i) > + { > + const format_result::alias_info &alias = res->aliases[i]; > + > + enum { Possible = -1, None = 0, Certain = 1 } overlap = None; GNU standards mandate upper case for enum constants. > + > + /* If the precision is zero there is no overlap. (This only > + considers %s directives and ignores %n.) */ > + if (alias.dir.prec[0] == 0 && alias.dir.prec[1] == 0) > + continue; > + > + if (alias.offset == HOST_WIDE_INT_MAX > + || info.dst_offset == HOST_WIDE_INT_MAX) > + overlap = Possible; > + else if (alias.offset == info.dst_offset) > + overlap = alias.dir.prec[0] == 0 ? Possible : Certain; > + else > + { > + /* Dtermine overlap from the range of output and offsets > + into the same destination as the source, and rule out > + impossible overlap. */ s/Dtermine/Determine/ I don't think the questions above really affect whether or not this should go forward. So OK with the nits fixed. Jeff