On 07/01/2016 08:15 PM, Martin Sebor wrote:
The attached patch enhances compile-time checking for buffer overflow
and output truncation in non-trivial calls to the sprintf family of
functions under a new option -Wformat-length=[12].  This initial
patch handles printf directives with string, integer, and simple
floating arguments but eventually I'd like to extend it all other
functions and directives for which it makes sense.

On the whole I think this looks good.

+      /* Check the whole format strings and ist arguments for possible
+        buffer overflow or truncation.  */

"its"

 struct function_format_info
 {
+  built_in_function fncode;
   int format_type;                     /* type of format (printf, scanf, etc.) 
*/
   unsigned HOST_WIDE_INT format_num;   /* number of format argument */
   unsigned HOST_WIDE_INT first_arg_num;        /* number of first arg (zero 
for varargs) */
+  /* The destination object size or -1 if unknown.  */
+  unsigned HOST_WIDE_INT objsize;
+
+  /* True for functions whose output is bounded by a size argument
+     (e.g., snprintf and vsnprintf).  */
+  bool bounded;
 };

While you're here could you fix the existing comments to be formatted like the ones you're adding (in front of the thing they're documenting, etc.)?

+static void
+check_format_length (location_t,
+                    format_check_results *,
+                    const function_format_info *,
+                    const format_char_info *,
+                    const char *, size_t, size_t,
+                    const conversion_spec *, tree);

Only break the line before the function name for definitions, not declarations.

+    unsigned char c = chr & 0xff;
+    return flags [c / (CHAR_BIT * sizeof *flags)]
+      & (1 << (c % (CHAR_BIT * sizeof *flags)));

> +    unsigned char c = chr & 0xff;
> +    flags [c / (CHAR_BIT * sizeof *flags)]
> +      |= (1 << (c % (CHAR_BIT * sizeof *flags)));

Multi-line expressions should be wrapped in parentheses for correct formatting. Also, no space before [] brackets.

+/* Return the logarithm of tree node X in BASE, incremented by 1 when
+   the optional PLUS sign is True, plus the length of the octal ('0')
+   or hexadecimal ('0x') prefix when PREFIX is True.  Return -1 when
+   X cannot be represented.  */

Maybe this could be clearer as
/* Return the number of characters required to print X, which must be
   an INTEGER_CST, in BASE.  PLUS indicates whether a plus sign should
   be prepended to positive numbers, and PREFIX indicates whether an
   octal ('O') or hexadecimal ('0x') prefix should be prepended to
   nonzero numbesr.  Return -1 if X cannot be represented.  */

+  if (prefix && absval)

This surprised me, but I checked and printf really does not seem to print 0x0.

+/* Return a range representing the minimum and maximum number of bytes
+   that the conversion specification SPEC will write on output for the
+   integer argument ARG.  */

Should indicate that ARG can be null, and what happens in that case. It's not entirely clear to me actually why it would be NULL sometimes.

+    width = (TREE_CODE (spec->star_width) == INTEGER_CST)
+            ? tree_to_shwi (spec->star_width) : 0;

+    prec = (TREE_CODE (spec->star_precision) == INTEGER_CST)
+           ? tree_to_shwi (spec->star_precision) : 0;

Wrapping. Lose parens around the condition, and wrap the whole expression.

+  /* A type of the argument to the directive, either deduced from
+     the actual non-constant argument if one is known, or from
+     the directive itself when none has been provided because it's
+     a va_list.  */
+  tree argtype = NULL_TREE;

"The type" maybe? Also, the structure of this function gets confusing around this point. It looks like we set argtype for everything except integer constants, then have a big block handling nonnull argtype and returning, then we have another block dealing with the other case. I think it would be clearer to check for INTEGER_CST first, deal with it, and then doing the type-based estimation.

+  else if (TREE_CODE (arg) == INTEGER_CST)
+    res.bounded = true;

The assignment appears to be repeated at the end of the function.

+    width = (TREE_CODE (spec->star_width) == INTEGER_CST)
+            ? tree_to_shwi (spec->star_width) : 0;

+    prec = (TREE_CODE (spec->star_precision) == INTEGER_CST)
+            ? tree_to_shwi (spec->star_precision) : 0;

See other instances. These lines occur several times in multiple functions.

+  int expdigs = -1;    /* Number of exponent digits or -1 when unknown.  */
+  int negative = -1;   /* 1 when arg < 0, 0 when arg >= 0, -1 when unknown.  */

I think we should avoid end-of-line comments. I tend to sometimes feel they are ok-ish in struct definitions, but let's not have them in code.

+      res.min = (0 < negative || spec->get_flag ('+') || spec->get_flag (' '))
+       + 1 /* unit */ + (prec < 0 ? 7 : prec ? prec + 1 : 0)
+       + 2 /* e+ */ + (logexpdigs < 2 ? 2 : logexpdigs);

Wrapping.

+      /* The maximum depends on the magnitude of the value but it's
+        at most 316 bytes for double and 4940 for long double, plus
+        precision if non-negative, or 6.  */
+      /* res.max = (spec->get_flag ('L') ? 4934 : 310) */
+      /*       + (prec < 0 ? 6 : prec ? prec + 1 : 0); */
+      res.max = expdigs + (prec < 0 ? 6 : prec ? prec + 1 : 0);

Not sure what the commented code is trying to tell me.

+      res.max = (spec->get_flag ('L') ? 4934 : 310)
+       + (prec < 0 ? 6 : prec ? prec + 1 : 0);

Wrapping.

+         int nul = arg && TREE_CODE (arg) == INTEGER_CST
+           ? integer_zerop (arg) : -1;

Wrapping.

+  else if (tree slen = arg ? c_strlen (arg, 1) : NULL_TREE)

Can c_strlen return NULL_TREE? If not I'd prefer to just test for arg in the if, and have the slen declaration moved inside it.

+/* FIXME: Suppress the warning to avoid having to change all
+   the format_char_info arrays below and to reduce the footprint
+   of the patch until it's been reviewed and accepted.  */

Please make sure this is removed in the next iteration...

Isn't this easy to fix anyway, replace "NULL }" with "NULL, NULL }"?

+#pragma GCC diagnostic pop

Also don't forget this one.

+  /* Number of characters written by the formatting function, exact,
+     minimum and maximum when an exact number cannot be determined.

I think this would be slightly better as "The number of characters written by the formatting function, either the exact number, or, if it cannot be determined, minimum and maximum."

+     Setting the minimum to a negative value disables all length
+     tracking for the remainder of the format string.
+     Setting either of the other two members disables the exact or
+     maximum length tracking, respectively, but continues to track
+     the maximum.  */

That's also slightly unclear. (Setting either of the other two to what?) Also, what indicates that we have an exact value rather than min/max?

+  /* True when the range given by NUMBER_CHARS_MIN and NUMBER_CHARS_MAX

"if" not "when" is preferrable I think; this occurs fairly consistently throughout the patch.

+  /* Increment the number of output characters by N.  */
+  void inc_number_chars (int n = 1) {

Not sure we have figured out how to format member functions but this isn't it. Two linebreaks missing I think.

 void
-check_function_format (tree attrs, int nargs, tree *argarray)
+check_function_format (const_tree fndecl, const_tree attrs, int nargs, tree 
*argarray)

80-character limit.

+  /* Avoid issuing one set of diagnostics during parsing, and another
+     (possibly duplicate) set when expanding printf built-ins.  There
+     are tradeoffs between the diagnostics issued during parsing and
+     later: the former doesn't benefit from constant propagation or
+     range information from the VRP pass, and the latter may not have
+     the original types of all the arguments.  As a result, for
+     arguments of narrower types with no range information that are
+     promoted to wider types the diagnostics during parsing can reduce
+     the rate of false positives by assuming the range of values of
+     the original type as opposed to the one after promotion.
+  */

Comment terminator goes at the end of the previous line.

+  /* The size of the destination as in snprintf(dest, size, ...).  */
+  unsigned HOST_WIDE_INT dstsize = ~(unsigned HOST_WIDE_INT)0;
+  /* The size of the destination determined by __builtin_object_size.  */
+  unsigned HOST_WIDE_INT objsize = ~(unsigned HOST_WIDE_INT)0;

This is HOST_WIDE_INT_M1U. Occurs in a few other places too.

+                 dstsize = warn_format_length < 2
+                   ? wi::fits_uhwi_p (max) ? max.to_uhwi () : max.to_shwi ()
+                   : wi::fits_uhwi_p (min) ? min.to_uhwi () : min.to_shwi ();

Wrapping.

+             if (res->number_chars < 0)
+               fmtstr = info->bounded
+                 ? "output may be truncated at or before format character "
+                   "%qc at offset %qlu past the end of a region of size %qlu"
+                 : "writing format character %qc at offset %qlu "
+                   "in a region of size %qlu";
+             else
+               fmtstr = info->bounded
+                 ? "output truncated at format character %qc at offset %qlu "
+                   "just past the end of a region of size %qlu"
+                 : "writing format character %qc at offset %qlu "
+                   "just past the end of a region of size %qlu";

Wrapping.

+     formt directives or arguments encountered during processing

"format"

+     a possible overflow ("may write"), a certain overlow somewhere "past

"overflow"

+
+      bool boundrange = res->number_chars_min < 0
+       || (res->number_chars_min < res->number_chars_max);

Wrapping. Also useless parens around the second condition.

+      const char* fmtstr = info->bounded
+       ? (res->number_chars < 0 && boundrange
+          ? "output may be truncated while copying format string "
+            "into a region of size %qlu"
+          : "output truncated while copying format string "
+            "into a region of size %qlu")
+       : (res->number_chars < 0 ? boundrange
+          ? "may write a terminating nul past the end "
+            "of a region of size %qlu"
+          : "writing a terminating nul past the end "
+            "of a region of size %qlu"
+          : "writing a terminating nul just past the end "
+            "of a region of size %qlu");

Wrapping.

+static void
+check_format_length (location_t                  loc,
+                    format_check_results       *res,
+                    const function_format_info *info,
+                    const format_char_info     *fci,
+                    const char                 *cvtbeg,
+                    size_t                      cvtlen,
+                    size_t                      offset,
+                    const conversion_spec      *cvtspec,
+                    tree                        arg)

Please don't format arguments like this.

+             const char* fmtstr = info->bounded
+               ? "%<%%%.*s%> directive output truncated %qi bytes "
+                 "into a region of size %qlu"
+               : "%<%%%.*s%> directive writing at least %qi bytes "
+                 "into a region of size %qlu";
+               warning_at (loc, OPT_Wformat_length_, fmtstr,
+                           (int)cvtlen, cvtbeg, fmtres.min,
+                           (unsigned long)navail);

+             const char* fmtstr = info->bounded
+               ? "%<%%%.*s%> directive output may be truncated between "
+                 "%qi and %qi bytes into a region of size %qlu"
+               : "%<%%%.*s%> directive writing between %qi and %qi bytes "
+                 "into a region of size %qlu";

Wrapping.

+         if (info->bounded)
+           fmtstr = 1 < fmtres.min
+             ? "%<%%%.*s%> directive output truncated while writing "
+               "%qi bytes into a region of size %qlu"
+             : "%<%%%.*s%> directive output truncated while writing "
+               "%qi byte into a region of size %qlu";
+         else
+           fmtstr = 1 < fmtres.min
+             ? "%<%%%.*s%> directive writing %qi bytes "
+               "into a region of size %qlu"
+             : "%<%%%.*s%> directive writing %qi byte "
+               "into a region of size %qlu";

Here too.

+  struct fmtresult {

Newline before brace?

+are assumed to be 1 character long.  Enabling optimization will in most
+cases improve the accuracy of the warning, although in some cases it may
+also result in false positives.

I think this ought to be "In most cases, enabling optimization improves the accuracy..."

+NUL character

No idea whether NUL needs markup. Existing documentation is inconsistent, both for NUL and NULL.

+buffer in an inforational note following the warning.

"informational"

+arguments can be bounded by specifying the precision in the fortmat

"format"

+the minimum buffer size.  For exampe, if @var{a} and @var{b} above can

"example"

diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 788031b..79783fd 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -486,7 +486,7 @@ make_vector_modes (enum mode_class cl, unsigned int width,
 {
   struct mode_data *m;
   struct mode_data *v;
-  char buf[8];
+  char buf[12];
   unsigned int ncomponents;
   enum mode_class vclass = vector_class (cl);

Avoiding the warning? Real problem or false positive?

+void
+lhd_check_format_length (const_tree ARG_UNUSED (fndecl),
+                        const_tree ARG_UNUSED (attrs),
+                        int ARG_UNUSED (nargs),
+                        tree * ARG_UNUSED (argarray))
+{
+}

I'm tempted to say let's just write C++, i.e. drop arg names altogether in obviously unused cases like this.

+
+      // inform (0, "executing pass: %s", pass->name);
+
       if (execute_one_pass (pass) && pass->sub)
-        execute_pass_list_1 (pass->sub);
+       {
+         // inform (0, "executing subpass: %s", pass->sub->name);
+         execute_pass_list_1 (pass->sub);
+       }

This ought to be removed.

+  /* Ra creates a range of signed char for A [idx].  A different
+     value is used each time to prevent the ranges from intesecting

"intersecting"

Beyond these I have no objections to the patch but ideally a C frontend maintainer would given an explicit ack as well.


Bernd

Reply via email to