On 01/25/2018 07:16 PM, Martin Sebor wrote:
> PR tree-optimization/83776 - [6/7/8 Regression] missing
> -Warray-bounds indexing past the end of a string literal,
> identified a not-so-recent improvement to constant propagation
> as the reason for GCC no longer being able to detect out-of-
> bounds accesses to string literals.  The root cause is that
> the change caused accesses to strings to be transformed into
> MEM_REFs that the -Warray-bounds checker isn't prepared to
> handle.  A simple example is:
> 
>   int h (void)
>   {
>     const char *p = "1234";
>     return p[16];   // missing -Warray-bounds
>   }
> 
> To fix the regression the attached patch extends the array bounds
> checker to handle the small subset of MEM_REF expressions that
> refer to string literals but stops of short of doing more than
> that.  There are outstanding gaps in the detection that the patch
> intentionally doesn't handle.  They are either caused by other
> regressions (PR 84047) or by other latent bugs/limitations, or
> by limitations in the approach I took to try to keep the patch
> simple.  I hope to address some of those in a follow-up patch
> for GCC 9.
> 
> Martin
> 
> gcc-83776.diff
> 
> 
> PR tree-optimization/83776 - [6/7/8 Regression] missing -Warray-bounds 
> indexing past the end of a string literal
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/83776
>       * tree-vrp.c (vrp_prop::check_mem_ref): New function.
>       (check_array_bounds): Call it.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR tree-optimization/83776
>       * gcc.dg/Warray-bounds-27.c: New test.
>       * gcc.dg/Warray-bounds-28.c: New test.
>       * gcc.dg/Warray-bounds-29.c: New test.
> 
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index 3294bde..b2e45c9 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -4763,6 +4763,7 @@ class vrp_prop : public ssa_propagation_engine
>    void vrp_finalize (bool);
>    void check_all_array_refs (void);
>    void check_array_ref (location_t, tree, bool);
> +  void check_mem_ref (location_t, tree);
>    void search_for_addr_array (tree, location_t);
>  
>    class vr_values vr_values;
> @@ -4781,6 +4782,7 @@ class vrp_prop : public ssa_propagation_engine
>    void extract_range_from_phi_node (gphi *phi, value_range *vr)
>      { vr_values.extract_range_from_phi_node (phi, vr); }
>  };
> +
>  /* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible arrays
>     and "struct" hacks. If VRP can determine that the
>     array subscript is a constant, check if it is outside valid
> @@ -4915,6 +4917,179 @@ vrp_prop::check_array_ref (location_t location, tree 
> ref,
>      }
>  }
>  
> +/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds
> +   references to string constants.  If VRP can determine that the array
> +   subscript is a constant, check if it is outside valid range.
> +   If the array subscript is a RANGE, warn if it is non-overlapping
> +   with valid range.
> +   IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR.  */
This function doesn't have IGNORE_OFF_BY_ONE as a parameter.  Drop it
from the comment.


> +
> +void
> +vrp_prop::check_mem_ref (location_t location, tree ref)
> +{
> +  if (TREE_NO_WARNING (ref))
> +    return;
> +
> +  tree arg = TREE_OPERAND (ref, 0);
> +  tree cstoff = TREE_OPERAND (ref, 1);
> +  tree varoff = NULL_TREE;
> +
> +  const offset_int maxobjsize = tree_to_shwi (max_object_size ());
> +
> +  /* The string constant bounds in bytes.  Initially set to [0, MAXOBJSIZE]
> +     until a tighter bound is determined.  */
> +  offset_int strbounds[2];
> +  strbounds[1] = maxobjsize;
> +  strbounds[0] = -strbounds[1] - 1;
> +
> +  /* The minimum and maximum intermediate offset.  For a reference
> +     to be valid, not only does the final offset/subscript must be
> +     in bounds but all intermediate offsets must be as well. */
> +  offset_int ioff = wi::to_offset (fold_convert (ptrdiff_type_node, cstoff));
> +  offset_int extrema[2] = { 0, wi::abs (ioff) };
> +
> +  /* The range of the byte offset into the reference.  */
> +  offset_int offrange[2] = { 0, 0 };
> +
> +  value_range *vr = NULL;
> +
> +  /* Determine the offsets and increment OFFRANGE for the bounds of each.  */
> +  while (TREE_CODE (arg) == SSA_NAME)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (arg);
> +      if (!is_gimple_assign (def))
> +     {
> +       if (tree var = SSA_NAME_VAR (arg))
> +         arg = var;
> +       break;
> +     }
What's the point of looking at the underlying SSA_NAME_VAR here? I can't
see how that's ever helpful.  You'll always exit the loop at this point
which does something like

if (TREE_CODE (arg) == ADDR_EXPR)
  {
     do something interesting
  }
else
  return;

ISTM that any time you dig into SSA_NAME_VAR (arg) what you're going to
get back is some kind of _DECL node -- I'm not aware of a case where
you're going to get back an ADDR_EXPR.




> +
> +      tree_code code = gimple_assign_rhs_code (def);
> +      if (code == POINTER_PLUS_EXPR)
> +     {
> +       arg = gimple_assign_rhs1 (def);
> +       varoff = gimple_assign_rhs2 (def);
> +     }
> +      else if (code == ASSERT_EXPR)
> +     {
> +       arg = TREE_OPERAND (gimple_assign_rhs1 (def), 0);
> +       continue;
> +     }
> +      else
> +     return;
> +
> +      if (TREE_CODE (varoff) != SSA_NAME)
> +     break;
> +
> +      vr = get_value_range (varoff);
> +      if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
> +     break;
Doesn't this let VR_ANTI_RANGE through?  Why not instead

  if (!vr || vr->type != VR_RANGE || !vr->min || !vr->max)

?  I'm having trouble convincing myself the subsequent code will DTRT
for an anti-range.




> +
> +      if (TREE_CODE (vr->min) != INTEGER_CST
> +          || TREE_CODE (vr->max) != INTEGER_CST)
> +        break;
> +
> +      offset_int ofr[] = {
> +     wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
> +     wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
> +      };
> +
> +      if (vr->type == VR_RANGE)
> +     {
> +       if (ofr[0] < ofr[1])
> +         {
> +           offrange[0] += ofr[0];
> +           offrange[1] += ofr[1];
> +         }
> +       else
> +         {
> +           offrange[0] += strbounds[0];
> +           offrange[1] += strbounds[1];
> +         }
When can the ELSE clause above happen for a VR_RANGE?


> +  /* At level 2 check also intermediate offsets.  */
> +  int i = 0;
> +  if (extrema[i] < -strbounds[1]
> +      || extrema[i = 1] > strbounds[1] + eltsize)
> +    {
> +      HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi ();
> +
> +      warning_at (location, OPT_Warray_bounds,
> +               "intermediate array offset %wi is outside array bounds "
> +               "of %qT",
> +               tmpidx,  strtype);
> +      TREE_NO_WARNING (ref) = 1;
> +    }
> +}
This seems ill-advised.  All that matters is the actual index used in
the dereference.  Intermediate values (ie, address computations) along
the way are uninteresting -- we may form an address out of the bounds of
the array as an intermediate computation.  But the actual memory
reference will be within the range.

I think there's some questions to answer here and likely another iteration.

jeff

Reply via email to