On 01/25/2018 07:16 PM, Martin Sebor wrote: > PR tree-optimization/83776 - [6/7/8 Regression] missing > -Warray-bounds indexing past the end of a string literal, > identified a not-so-recent improvement to constant propagation > as the reason for GCC no longer being able to detect out-of- > bounds accesses to string literals. The root cause is that > the change caused accesses to strings to be transformed into > MEM_REFs that the -Warray-bounds checker isn't prepared to > handle. A simple example is: > > int h (void) > { > const char *p = "1234"; > return p[16]; // missing -Warray-bounds > } > > To fix the regression the attached patch extends the array bounds > checker to handle the small subset of MEM_REF expressions that > refer to string literals but stops of short of doing more than > that. There are outstanding gaps in the detection that the patch > intentionally doesn't handle. They are either caused by other > regressions (PR 84047) or by other latent bugs/limitations, or > by limitations in the approach I took to try to keep the patch > simple. I hope to address some of those in a follow-up patch > for GCC 9. > > Martin > > gcc-83776.diff > > > PR tree-optimization/83776 - [6/7/8 Regression] missing -Warray-bounds > indexing past the end of a string literal > > gcc/ChangeLog: > > PR tree-optimization/83776 > * tree-vrp.c (vrp_prop::check_mem_ref): New function. > (check_array_bounds): Call it. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/83776 > * gcc.dg/Warray-bounds-27.c: New test. > * gcc.dg/Warray-bounds-28.c: New test. > * gcc.dg/Warray-bounds-29.c: New test. > > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c > index 3294bde..b2e45c9 100644 > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -4763,6 +4763,7 @@ class vrp_prop : public ssa_propagation_engine > void vrp_finalize (bool); > void check_all_array_refs (void); > void check_array_ref (location_t, tree, bool); > + void check_mem_ref (location_t, tree); > void search_for_addr_array (tree, location_t); > > class vr_values vr_values; > @@ -4781,6 +4782,7 @@ class vrp_prop : public ssa_propagation_engine > void extract_range_from_phi_node (gphi *phi, value_range *vr) > { vr_values.extract_range_from_phi_node (phi, vr); } > }; > + > /* Checks one ARRAY_REF in REF, located at LOCUS. Ignores flexible arrays > and "struct" hacks. If VRP can determine that the > array subscript is a constant, check if it is outside valid > @@ -4915,6 +4917,179 @@ vrp_prop::check_array_ref (location_t location, tree > ref, > } > } > > +/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds > + references to string constants. If VRP can determine that the array > + subscript is a constant, check if it is outside valid range. > + If the array subscript is a RANGE, warn if it is non-overlapping > + with valid range. > + IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR. */ This function doesn't have IGNORE_OFF_BY_ONE as a parameter. Drop it from the comment.
> + > +void > +vrp_prop::check_mem_ref (location_t location, tree ref) > +{ > + if (TREE_NO_WARNING (ref)) > + return; > + > + tree arg = TREE_OPERAND (ref, 0); > + tree cstoff = TREE_OPERAND (ref, 1); > + tree varoff = NULL_TREE; > + > + const offset_int maxobjsize = tree_to_shwi (max_object_size ()); > + > + /* The string constant bounds in bytes. Initially set to [0, MAXOBJSIZE] > + until a tighter bound is determined. */ > + offset_int strbounds[2]; > + strbounds[1] = maxobjsize; > + strbounds[0] = -strbounds[1] - 1; > + > + /* The minimum and maximum intermediate offset. For a reference > + to be valid, not only does the final offset/subscript must be > + in bounds but all intermediate offsets must be as well. */ > + offset_int ioff = wi::to_offset (fold_convert (ptrdiff_type_node, cstoff)); > + offset_int extrema[2] = { 0, wi::abs (ioff) }; > + > + /* The range of the byte offset into the reference. */ > + offset_int offrange[2] = { 0, 0 }; > + > + value_range *vr = NULL; > + > + /* Determine the offsets and increment OFFRANGE for the bounds of each. */ > + while (TREE_CODE (arg) == SSA_NAME) > + { > + gimple *def = SSA_NAME_DEF_STMT (arg); > + if (!is_gimple_assign (def)) > + { > + if (tree var = SSA_NAME_VAR (arg)) > + arg = var; > + break; > + } What's the point of looking at the underlying SSA_NAME_VAR here? I can't see how that's ever helpful. You'll always exit the loop at this point which does something like if (TREE_CODE (arg) == ADDR_EXPR) { do something interesting } else return; ISTM that any time you dig into SSA_NAME_VAR (arg) what you're going to get back is some kind of _DECL node -- I'm not aware of a case where you're going to get back an ADDR_EXPR. > + > + tree_code code = gimple_assign_rhs_code (def); > + if (code == POINTER_PLUS_EXPR) > + { > + arg = gimple_assign_rhs1 (def); > + varoff = gimple_assign_rhs2 (def); > + } > + else if (code == ASSERT_EXPR) > + { > + arg = TREE_OPERAND (gimple_assign_rhs1 (def), 0); > + continue; > + } > + else > + return; > + > + if (TREE_CODE (varoff) != SSA_NAME) > + break; > + > + vr = get_value_range (varoff); > + if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max) > + break; Doesn't this let VR_ANTI_RANGE through? Why not instead if (!vr || vr->type != VR_RANGE || !vr->min || !vr->max) ? I'm having trouble convincing myself the subsequent code will DTRT for an anti-range. > + > + if (TREE_CODE (vr->min) != INTEGER_CST > + || TREE_CODE (vr->max) != INTEGER_CST) > + break; > + > + offset_int ofr[] = { > + wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)), > + wi::to_offset (fold_convert (ptrdiff_type_node, vr->max)) > + }; > + > + if (vr->type == VR_RANGE) > + { > + if (ofr[0] < ofr[1]) > + { > + offrange[0] += ofr[0]; > + offrange[1] += ofr[1]; > + } > + else > + { > + offrange[0] += strbounds[0]; > + offrange[1] += strbounds[1]; > + } When can the ELSE clause above happen for a VR_RANGE? > + /* At level 2 check also intermediate offsets. */ > + int i = 0; > + if (extrema[i] < -strbounds[1] > + || extrema[i = 1] > strbounds[1] + eltsize) > + { > + HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi (); > + > + warning_at (location, OPT_Warray_bounds, > + "intermediate array offset %wi is outside array bounds " > + "of %qT", > + tmpidx, strtype); > + TREE_NO_WARNING (ref) = 1; > + } > +} This seems ill-advised. All that matters is the actual index used in the dereference. Intermediate values (ie, address computations) along the way are uninteresting -- we may form an address out of the bounds of the array as an intermediate computation. But the actual memory reference will be within the range. I think there's some questions to answer here and likely another iteration. jeff