Re: [PATCH] Make strlen range computations more conservative

Bernd Edlinger Sat, 20 Oct 2018 02:17:14 -0700

On 10/17/18 11:56 PM, Jeff Law wrote:
> On 10/12/18 9:34 PM, Bernd Edlinger wrote:
>> On 10/12/18 16:55, Jeff Law wrote:
>>> On 9/15/18 2:43 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this is an update on my strlen range patch (V7).  Again re-based and
>>>> retested to current trunk.
>>>>
>>>> I am aware that Martin wants to re-factor the interface of get_range_strlen
>>>> and have no objections against, but I'd suggest that to be a follow-up 
>>>> patch.
>>>>
>>>> I might suggest to rename one of the two get_range_strlen functions at the
>>>> same time as it is rather confusing to have to count the parameters in 
>>>> order
>>>> to tell which function is meant.
>>>>
>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>> Is it OK for trunk?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>> changelog-range-strlen-v7.txt
>>>>
>>>> gcc:
>>>> 2018-08-26  Bernd Edlinger  <bernd.edlin...@hotmail.de>
>>>>
>>>>    * gimple-fold.c (looks_like_a_char_array_without_typecast_p): New
>>>>    helper function for strlen range estimations.
>>>>    (get_range_strlen): Use looks_like_a_char_array_without_typecast_p
>>>>    for warnings, but use GIMPLE semantics otherwise.
>>>>    * tree-ssa-strlen.c (maybe_set_strlen_range): Use GIMPLE semantics.
>>>>    (get_min_string_length): Avoid not NUL terminated string literals.
>>> The introduction of looks_like_a_char_array_without_typecast_p is
>>> probably a good thing.  Too  much code is already implemented inline
>>> within get_range_strlen.
>>>
>>> It looks like you added handling of ARRAY_RANGE_REF.  I don't know how
>>> often they come up in practice, but handling it seems like a reasonable
>>> extension to what we're doing.  Bonus points if it's triggering with any
>>> kind of consistency.
>>>
>>
>> I did only want to be consistent with get_inner_reference here,
>> but did not have encountered these, probably only an Ada thing?
> Trying to be consistent with get_inner_reference is fine :-)  GCC
> supports case ranges as an extension for C/C++.  No clue if they're
> natively supported by Ada or any other langauge.
> 
> 
> 
>>
>>> I actually prefer Martin's unification of type/fuzzy into a single
>>> enumeration to describe the desired behavior.  Doing it with two args
>>> where some values are mutually exclusive is just asking for trouble.
>>> Though I like that you called out the values that are mutually exclusive.
>>>
>>> I definitely want to look at how your patch and Martin's differ on the
>>> handling of flexible array members -- clearly we must avoid setting a
>>> range in that case.  I'm surprised this didn't trigger a failure in the
>>> testsuite though.  Martin's work in this space did.
>>>
>>> The bugfix in get_min_string_length looks like it probably stands on its
>>> own.
>>>
>>> I'm still evaluating the two approaches...
>>>
>>
>> One thing I should mention is, that there is still one place where 
>> opportunistic
>> range info influence conde gen.  I mean at least with my patch.
> ACK.   That's soemthing Martin's patch does address.  AT least it's
> supposed to.


Okay, based on my previous patch I can of course do the same.

See attached.  This was bootstrapped and reg-tested together with my
previous patch.  The only "regression" was pr79376.c, which is xfailed
because the test case is expecting the return value to be in the limits given
by in the opportunistic range info.

While I think the strlen return optimization will be safe with this patch,
I have however still a philosophical problem with it, because s[n]printf
is a highly complex piece of software, and we take it away the right
to return a failure code, when it has to because of an implementation bug.

>>
>> That is the return value from sprintf is using the range info from the
>> warning, and uses that to set the range info of the result.
>> In try_substitute_return_value, which uses the range info that was
>> from the warnings and feeds that into set_range_info.
> Right.  In Martin's work we have enough range info to distinguish
> between the range info for warnings and the true range info and only use
> the latter in the call to set_range_info.
> 
> 

Well I have tried the test cases from Martins patch, and all except one
work fine for me, and pass with my patch-set as well.

The problematic one is strlenopt-59.c (in his patch, my patch has picked
the same name, unfortunately).

The difference is how object declarations are handled.  While my patch
does not try to solve that problem at all, his patch does probably look
at the declaration size to improve the strict limits.

I am not totally against it, but do not feel any need to implement that
feature in the same patch together with a function interface change, and
a code-correctness fix.

 From the test case it looks like the globals are comdat objects, because there
is no initialization.  You can declare "char a3[3];" and "char a3[100];" in
different translation units and it will be a3[100] at run-time.

For me the red line here is basically, that the strlen optimization should
_not_ be more aggressive than the loop-niter optimization, thus the lackmus
test is, would the test case pass if strlen is implemented as:

#define strlen(c) ({ __SIZE_TYPE__ _n; for(_n=0; (c)[_n]; _n++); _n; })

Well, it does not.  But that should probably considered as a goal.



Bernd.

2018-10-19  Bernd Edlinger  <bernd.edlin...@hotmail.de>

	* gimple-ssa-sprintf.c (result_range::strict_max): New member.
	(format_result::operator++, format_result::operator+=): Remove.
	(fmtresult::adjust_for_width_or_precision, format_none,
	format_percent, format_integer, format_floating,
	format_character, format_string): Handle strict_max.
	(get_string_length): Compute strict_max.
	(format_directive): Accumulate strict_max.
	(is_call_safe): Use strict_max.

testsuite:
2018-10-19  Bernd Edlinger  <bernd.edlin...@hotmail.de>

	* gcc.dg/tree-ssa/pr79376.c: Add xfail.

diff -Npur gcc/gimple-ssa-sprintf.c gcc/gimple-ssa-sprintf.c
--- gcc/gimple-ssa-sprintf.c	2018-10-04 04:55:10.000000000 +0200
+++ gcc/gimple-ssa-sprintf.c	2018-10-19 14:55:44.971280820 +0200
@@ -191,6 +191,8 @@ struct result_range
      UNLIKELY == MAX.  UNLIKELY is used to control the return value
      optimization but not in diagnostics.  */
   unsigned HOST_WIDE_INT unlikely;
+  /* Conservative maximum range value.  */
+  unsigned HOST_WIDE_INT strict_max;
 };
 
 /* The result of a call to a formatted function.  */
@@ -228,45 +230,8 @@ struct format_result
      avoid issuing duplicate warnings while finishing the processing
      of a call.  WARNED also disables the return value optimization.  */
   bool warned;
-
-  /* Preincrement the number of output characters by 1.  */
-  format_result& operator++ ()
-  {
-    return *this += 1;
-  }
-
-  /* Postincrement the number of output characters by 1.  */
-  format_result operator++ (int)
-  {
-    format_result prev (*this);
-    *this += 1;
-    return prev;
-  }
-
-  /* Increment the number of output characters by N.  */
-  format_result& operator+= (unsigned HOST_WIDE_INT);
 };
 
-format_result&
-format_result::operator+= (unsigned HOST_WIDE_INT n)
-{
-  gcc_assert (n < HOST_WIDE_INT_MAX);
-
-  if (range.min < HOST_WIDE_INT_MAX)
-    range.min += n;
-
-  if (range.max < HOST_WIDE_INT_MAX)
-    range.max += n;
-
-  if (range.likely < HOST_WIDE_INT_MAX)
-    range.likely += n;
-
-  if (range.unlikely < HOST_WIDE_INT_MAX)
-    range.unlikely += n;
-
-  return *this;
-}
-
 /* Return the value of INT_MIN for the target.  */
 
 static inline HOST_WIDE_INT
@@ -524,6 +489,7 @@ struct fmtresult
     range.max = min;
     range.likely = min;
     range.unlikely = min;
+    range.strict_max = HOST_WIDE_INT_M1U;
   }
 
   /* Construct a FMTRESULT object with MIN, MAX, and LIKELY counters.
@@ -538,6 +504,7 @@ struct fmtresult
     range.max = max;
     range.likely = max < likely ? min : likely;
     range.unlikely = max;
+    range.strict_max = HOST_WIDE_INT_M1U;
   }
 
   /* Adjust result upward to reflect the RANGE of values the specified
@@ -617,6 +584,8 @@ fmtresult::adjust_for_width_or_precision
 	     adjusted.  Otherwise leave it at what it was before.  */
 	  knownrange = minadjusted;
 	}
+      if (range.strict_max < (unsigned HOST_WIDE_INT)adjust[1])
+	range.strict_max = adjust[1];
     }
 
   if (warn_level > 1 && type)
@@ -948,6 +917,7 @@ static fmtresult
 format_none (const directive &, tree, vr_values *)
 {
   fmtresult res (0);
+  res.range.strict_max = res.range.max;
   return res;
 }
 
@@ -957,6 +927,7 @@ static fmtresult
 format_percent (const directive &, tree, vr_values *)
 {
   fmtresult res (1);
+  res.range.strict_max = res.range.max;
   return res;
 }
 
@@ -1319,6 +1290,7 @@ format_integer (const directive &dir, tr
 	}
 
       res.range.unlikely = res.range.max;
+      res.range.strict_max = res.range.max;
 
       /* Bump up the counters if WIDTH is greater than LEN.  */
       res.adjust_for_width_or_precision (dir.width, dirtype, base,
@@ -1486,6 +1458,8 @@ format_integer (const directive &dir, tr
     }
 
   res.range.unlikely = res.range.max;
+  res.range.strict_max = res.range.max;
+
   res.adjust_for_width_or_precision (dir.width, dirtype, base,
 				     (sign | maybebase) + (base == 16));
   res.adjust_for_width_or_precision (dir.prec, dirtype, base,
@@ -1796,6 +1770,8 @@ format_floating (const directive &dir, c
       return fmtresult ();
     }
 
+  res.range.strict_max = res.range.unlikely;
+
   /* Bump up the byte counters if WIDTH is greater.  */
   res.adjust_for_width_or_precision (dir.width);
   return res;
@@ -1907,6 +1883,8 @@ format_floating (const directive &dir, t
 	 such as inf/infinity (e.g., Solaris).  */
       res.knownrange = dir.known_width_and_precision ();
 
+      res.range.strict_max = res.range.unlikely;
+
       /* Adjust the range for width but ignore precision.  */
       res.adjust_for_width_or_precision (dir.width);
 
@@ -1991,6 +1969,8 @@ format_floating (const directive &dir, t
       res.range.unlikely += target_mb_len_max () - 1;
     }
 
+  res.range.strict_max = res.range.unlikely;
+
   res.adjust_for_width_or_precision (dir.width);
   return res;
 }
@@ -2014,6 +1994,7 @@ get_string_length (tree str, unsigned el
 	 we know its length.  */
       fmtresult res (tree_to_shwi (slen));
       res.nonstr = NULL_TREE;
+      res.range.strict_max = res.range.max;
       return res;
     }
   else if (!slen
@@ -2086,6 +2067,10 @@ get_string_length (tree str, unsigned el
 	 flexible array member, such as in struct S { char a[4]; };  */
       res.range.unlikely = flexarray ? HOST_WIDE_INT_MAX : res.range.max;
 
+      if (!get_range_strlen (str, lenrange, eltsize, true)
+	  && tree_fits_uhwi_p (lenrange[1]))
+	res.range.strict_max = tree_to_uhwi (lenrange[1]);
+
       return res;
     }
 
@@ -2130,7 +2115,7 @@ format_character (const directive &dir,
 
 	      /* A wide character in the ASCII range most likely results
 		 in a single byte, and only unlikely in up to MB_LEN_MAX.  */
-	      res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();;
+	      res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();
 	      res.range.likely = 1;
 	      res.range.unlikely = target_mb_len_max ();
 	      res.mayfail = !one_2_one_ascii;
@@ -2164,6 +2149,8 @@ format_character (const directive &dir,
       res.knownrange = true;
     }
 
+  res.range.strict_max = res.range.unlikely;
+
   /* Bump up the byte counters if WIDTH is greater.  */
   return res.adjust_for_width_or_precision (dir.width);
 }
@@ -2209,6 +2196,11 @@ format_string (const directive &dir, tre
 	     is bounded by MB_LEN_MAX * wcslen (S).  */
 	  res.range.max *= target_mb_len_max ();
 	  res.range.unlikely = res.range.max;
+	  if (res.range.strict_max < target_int_max () / target_mb_len_max ())
+	    res.range.strict_max *= target_mb_len_max ();
+	  else
+	    res.range.strict_max = HOST_WIDE_INT_M1U;
+
 	  /* It's likely that the the total length is not more that
 	     2 * wcslen (S).*/
 	  res.range.likely = res.range.min * 2;
@@ -2282,9 +2274,14 @@ format_string (const directive &dir, tre
 	  if (slen.range.likely < target_int_max ())
 	    slen.range.likely *= 2;
 
-	  if (slen.range.likely < target_int_max ())
+	  if (slen.range.unlikely < target_int_max ())
 	    slen.range.unlikely *= target_mb_len_max ();
 
+	  if (slen.range.strict_max < target_int_max () / target_mb_len_max ())
+	    slen.range.strict_max *= target_mb_len_max ();
+	  else
+	    slen.range.strict_max = HOST_WIDE_INT_M1U;
+
 	  /* A non-empty wide character conversion may fail.  */
 	  if (slen.range.max > 0)
 	    res.mayfail = true;
@@ -2355,6 +2352,8 @@ format_string (const directive &dir, tre
      of bytes on output isn't bounded by precision, set NONSTR.  */
   if (slen.nonstr && slen.range.min < (unsigned HOST_WIDE_INT)dir.prec[0])
     res.nonstr = slen.nonstr;
+  if ((unsigned HOST_WIDE_INT)dir.prec[1] < res.range.strict_max)
+    res.range.strict_max = dir.prec[1];
 
   /* Bump up the byte counters if WIDTH is greater.  */
   return res.adjust_for_width_or_precision (dir.width);
@@ -2366,6 +2365,7 @@ static fmtresult
 format_plain (const directive &dir, tree, vr_values *)
 {
   fmtresult res (dir.len);
+  res.range.strict_max = res.range.max;
   return res;
 }
 
@@ -2752,6 +2752,10 @@ format_directive (const sprintf_dom_walk
 
   /* Compute the range of lengths of the formatted output.  */
   fmtresult fmtres = dir.fmtfunc (dir, dir.arg, vr_values);
+  if(res->range.strict_max < HOST_WIDE_INT_M1U - fmtres.range.strict_max)
+    res->range.strict_max += fmtres.range.strict_max;
+  else
+    res->range.strict_max = HOST_WIDE_INT_M1U;
 
   /* Record whether the output of all directives is known to be
      bounded by some maximum, implying that their arguments are
@@ -3554,11 +3558,8 @@ is_call_safe (const sprintf_dom_walker::
   /* The minimum return value.  */
   retval[0] = res.range.min;
 
-  /* The maximum return value is in most cases bounded by RES.RANGE.MAX
-     but in cases involving multibyte characters could be as large as
-     RES.RANGE.UNLIKELY.  */
-  retval[1]
-    = res.range.unlikely < res.range.max ? res.range.max : res.range.unlikely;
+  /* The maximum return value.  */
+  retval[1] = res.range.strict_max;
 
   /* Adjust the number of bytes which includes the terminating nul
      to reflect the return value of the function which does not.
diff -Npur gcc/testsuite/gcc.dg/tree-ssa/pr79376.c gcc/testsuite/gcc.dg/tree-ssa/pr79376.c
--- gcc/testsuite/gcc.dg/tree-ssa/pr79376.c	2017-04-15 22:07:47.000000000 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr79376.c	2018-10-19 11:16:46.991725994 +0200
@@ -105,5 +105,5 @@ void test_string_and_array (int i, struc
   }
 }
 
-/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized"} }
+/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized" { xfail *-*-* } } }
    { dg-final { scan-tree-dump-times "keep_call_on_line" 21 "optimized"} } */

Re: [PATCH] Make strlen range computations more conservative

Reply via email to