Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

Bin.Cheng Fri, 06 May 2016 02:41:02 -0700

On Tue, May 3, 2016 at 11:08 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Mon, May 2, 2016 at 10:00 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>> Hi,
>>>>>> Tree if-conversion sometimes cannot convert conditional array reference 
>>>>>> into unconditional one.  Root cause is GCC conservatively assumes newly 
>>>>>> introduced array reference could be out of array bound and thus 
>>>>>> trapping.  This patch improves the situation by proving the converted 
>>>>>> unconditional array reference is within array bound using loop niter 
>>>>>> information.  To be specific, it checks every index of array reference 
>>>>>> to see if it's within bound in ifcvt_memrefs_wont_trap.  This patch also 
>>>>>> factors out base_object_writable checking if the base object is writable 
>>>>>> or not.
>>>>>> Bootstrap and test on x86_64 and aarch64, is it OK?
>>>>>
>>>>> I think you miss to handle the case optimally where the only
>>>>> non-ARRAY_REF idx is the dereference of the
>>>>> base-pointer for, say, p->a[i].  In this case we can use
>>>>> base_master_dr to see if p is unconditionally dereferenced
>>>> Yes, will pick up this case.
>>>>
>>>>> in the loop.  You also fail to handle the case where we have
>>>>> MEM_REF[&x].a[i] that is, you see a decl base.
>>>> I am having difficulty in creating this case for ifcvt, any advices?  
>>>> Thanks.
>>>
>>> Sth like
>>>
>>> float a[128];
>>> float foo (int n, int i)
>>> {
>>>   return (*((float(*)[n])a))[i];
>>> }
>>>
>>> should do the trick (w/o the component-ref).  Any other type-punning
>>> would do it, too.
>>>
>>>>> I suppose for_each_index should be fixed for this particular case (to
>>>>> return true), same for TARGET_MEM_REF TMR_BASE.
>>>>>
>>>>> +  /* The case of nonconstant bounds could be handled, but it would be
>>>>> +     complicated.  */
>>>>> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
>>>>> +      || !high || TREE_CODE (high) != INTEGER_CST)
>>>>> +    return false;
>>>>> +
>>>>>
>>>>> handling of a non-zero but constant low bound is important - otherwise
>>>>> all this is a no-op for Fortran.  It
>>>>> shouldn't be too difficult to handle after all.  In fact I think your
>>>>> code does handle it correctly already.
>>>>>
>>>>> +  if (!init || TREE_CODE (init) != INTEGER_CST
>>>>> +      || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
>>>>> (step))
>>>>> +    return false;
>>>>>
>>>>> step == 0 should be easy to handle as well, no?  The index will simply
>>>>> always be 'init' ...
>>>>>
>>>>> +  /* In case the relevant bound of the array does not fit in type, or
>>>>> +     it does, but bound + step (in type) still belongs into the range of 
>>>>> the
>>>>> +     array, the index may wrap and still stay within the range of the 
>>>>> array
>>>>> +     (consider e.g. if the array is indexed by the full range of
>>>>> +     unsigned char).
>>>>> +
>>>>> +     To make things simpler, we require both bounds to fit into type, 
>>>>> although
>>>>> +     there are cases where this would not be strictly necessary.  */
>>>>> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
>>>>> +    return false;
>>>>> +
>>>>> +  low = fold_convert (type, low);
>>>>>
>>>>> please use wide_int for all of this.
>>>> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
>>>> sure what's the meaning by "handle "low = fold_convert (type, low);"
>>>> related code in wide_int".   Do you mean to use tree_int_cst_compare
>>>> instead of tree_int_cst_compare in the following code?
>>>
>>> I don't think you need any kind of fits-to-type check here.  You'd simply
>>> use to_widest () when operating on / comparing with high/low.
>> But what would happen if low/high and init/step are different in type
>> sign-ness?  Anything special I need to do before using wi::ltu_p or
>> wi::lts_p directly?
>
> You want to use to_widest (min) which extends according to sign to
> an "infinite" precision signed integer.  So you can then use the new
> operator< overloads as well.
>
Hi,
Here is the updated patch.  It includes below changes according to
review comments:


1) It uses widest_int for all INTEGER_CST tree computations, which
simplifies the patch a lot.
2) It covers array with non-zero low bound, which is important for Fortran.
3) It picks up a boundary case so that ifc-11.c/vect-23.c/vect-24.c
can be handled.
4) It also checks within bound array reference inside a structure like
p->a[i] by using base_master_dr in tree-if-conv.c so that ifc-12.c can
be handled.

It leaves two other review comments not addressed:
1) It doesn't handle array reference whose idx is a wrapping SCEV.
Because only read is safe and vectorizer itself may be confused by it
now.
2) It doesn't handle array reference in the form of
"MEM[(float[0:(sizetype) ((long int) SAVE_EXPR <m.2> + -1)]
*)&b][i_1];". To handle this case, existing code in
array_at_struct_end_p as well as this patch need to be improved.  I
tend to handle it in an independent patch.

With these changes, now cases pr61194.c and vect-23.c can be
vectorized, I removed XFAIL from them.  Also vect-mask-store-move-1.c
is affected and not vectorized now, this is tricky and it exposes a
known bug PR65206 in vectorizer's dependence analyzer.  This should be
handled independently too.  Also gcc.dg/vect/vect-24.c now is XPASS on
AArch64, but not x86_64.  Root cause is dom2 jump threads first
iteration of loop thus idx_within_array_bound is failed.  I didn't
check if jump threading is good in this case, either I remove the
XFAIL mark.  I tend to improve idx_within_array_bound by using VRP
information in a following patch.
Otherwise bootstrap and test on x86_64 and AArch64 are fine.  Is this
version OK?

Thanks,
bin

2016-05-04  Bin Cheng  <bin.ch...@arm.com>

    * tree-if-conv.c (tree-ssa-loop.h): Include header file.
    (tree-ssa-loop-niter.h): Ditto.
    (idx_within_array_bound, ref_within_array_bound): New functions.
    (ifcvt_memrefs_wont_trap): Check if array ref is within bound.
    Factor out check on writable base object to ...
    (base_object_writable): ... here.

gcc/testsuite/ChangeLog
2016-05-04  Bin Cheng  <bin.ch...@arm.com>

    * gcc.dg/tree-ssa/ifc-9.c: New test.
    * gcc.dg/tree-ssa/ifc-10.c: New test.
    * gcc.dg/tree-ssa/ifc-11.c: New test.
    * gcc.dg/tree-ssa/ifc-12.c: New test.
    * gcc.dg/vect/pr61194.c: Remove XFAIL.
    * gcc.dg/vect/vect-23.c: Remove XFAIL.
    * gcc.dg/vect/vect-mask-store-move-1.c: Revise test check.

diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c 
b/gcc/testsuite/gcc.dg/vect/pr61194.c
index 8d74e00..f7c71b9 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61194.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61194.c
@@ -38,4 +38,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c 
b/gcc/testsuite/gcc.dg/vect/vect-23.c
index 44bed75..e463f1b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-23.c
@@ -123,5 +123,5 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-24.c 
b/gcc/testsuite/gcc.dg/vect/vect-24.c
index 09a6d7e..d27410b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-24.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-24.c
@@ -123,5 +123,5 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
index f5cae4f..f928dbf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
@@ -15,4 +15,4 @@ void foo (int n)
       }
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 4 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 32ced16..5527a7c 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -106,6 +106,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "tree-scalar-evolution.h"
+#include "tree-ssa-loop.h"
+#include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-address.h"
 #include "dbgcnt.h"
@@ -740,6 +742,105 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info 
(data_reference_p a)
     }
 }
 
+/* Return TRUE if can prove the index IDX of an array reference REF is
+   within array bound.  Return false otherwise.  */
+
+static bool
+idx_within_array_bound (tree ref, tree *idx, void *dta)
+{
+  bool overflow;
+  widest_int niter, valid_niter, delta, wi_step;
+  tree ev, init, step;
+  tree low, high;
+  struct loop *loop = (struct loop*) dta;
+
+  /* Only support within-bound access for array references.  */
+  if (TREE_CODE (ref) != ARRAY_REF)
+    return false;
+
+  /* For arrays at the end of the structure, we are not guaranteed that they
+     do not really extend over their declared size.  However, for arrays of
+     size greater than one, this is unlikely to be intended.  */
+  if (array_at_struct_end_p (ref))
+    return false;
+
+  ev = analyze_scalar_evolution (loop, *idx);
+  ev = instantiate_parameters (loop, ev);
+  init = initial_condition (ev);
+  step = evolution_part_in_loop_num (ev, loop->num);
+
+  if (!init || TREE_CODE (init) != INTEGER_CST
+      || (step && TREE_CODE (step) != INTEGER_CST))
+    return false;
+
+  low = array_ref_low_bound (ref);
+  high = array_ref_up_bound (ref);
+
+  /* The case of nonconstant bounds could be handled, but it would be
+     complicated.  */
+  if (TREE_CODE (low) != INTEGER_CST
+      || !high || TREE_CODE (high) != INTEGER_CST)
+    return false;
+
+  /* Check if the intial idx is within bound.  */
+  if (wi::to_widest (init) < wi::to_widest (low)
+      || wi::to_widest (init) > wi::to_widest (high))
+    return false;
+
+  /* The idx is always within bound.  */
+  if (!step || integer_zerop (step))
+    return true;
+
+  if (!max_loop_iterations (loop, &niter))
+    return false;
+
+  if (wi::to_widest (step) < 0)
+    {
+      delta = wi::to_widest (init) - wi::to_widest (low);
+      wi_step = -wi::to_widest (step);
+    }
+  else
+    {
+      delta = wi::to_widest (high) - wi::to_widest (init);
+      wi_step = wi::to_widest (step);
+    }
+
+  valid_niter = wi::div_floor (delta, wi_step, SIGNED, &overflow);
+  /* The iteration space of idx is within array bound.  */
+  if (!overflow && niter <= valid_niter)
+    return true;
+
+  return false;
+}
+
+/* Return TRUE if ref is a within bound array reference.  */
+
+static bool
+ref_within_array_bound (gimple *stmt, tree ref)
+{
+  struct loop *loop = loop_containing_stmt (stmt);
+
+  gcc_assert (loop != NULL);
+  return for_each_index (&ref, idx_within_array_bound, loop);
+}
+
+
+/* Given a memory reference expression T, return TRUE if base object
+   it refers to is writable.  The base object of a memory reference
+   is the main object being referenced, which is returned by function
+   get_base_address.  */
+
+static bool
+base_object_writable (tree ref)
+{
+  tree base_tree = get_base_address (ref);
+
+  return (base_tree
+         && DECL_P (base_tree)
+         && decl_binds_to_current_def_p (base_tree)
+         && !TREE_READONLY (base_tree));
+}
+
 /* Return true when the memory references of STMT won't trap in the
    if-converted code.  There are two things that we have to check for:
 
@@ -787,8 +888,13 @@ ifcvt_memrefs_wont_trap (gimple *stmt, 
vec<data_reference_p> drs)
   if (DR_W_UNCONDITIONALLY (*master_dr))
     return true;
 
-  /* If a is unconditionally accessed then ... */
-  if (DR_RW_UNCONDITIONALLY (*master_dr))
+  /* If a is unconditionally accessed then ...
+
+     Even a is conditional access, we can treat it as an unconditional
+     one if it's an array reference and all its index are within array
+     bound.  */
+  if (DR_RW_UNCONDITIONALLY (*master_dr)
+      || ref_within_array_bound (stmt, DR_REF (a)))
     {
       /* an unconditional read won't trap.  */
       if (DR_IS_READ (a))
@@ -799,16 +905,11 @@ ifcvt_memrefs_wont_trap (gimple *stmt, 
vec<data_reference_p> drs)
       if (base_master_dr
          && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
        return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
-      else
-       {
-         /* or the base is know to be not readonly.  */
-         tree base_tree = get_base_address (DR_REF (a));
-         if (DECL_P (base_tree)
-             && decl_binds_to_current_def_p (base_tree)
-             && ! TREE_READONLY (base_tree))
-           return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
-       }
+      /* or the base is known to be not readonly.  */
+      else if (base_object_writable (DR_REF (a)))
+       return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
     }
+
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
new file mode 100644
index 0000000..70b7422
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */
+/* { dg-require-visibility "" } */
+
+int b[256] = {0}, y;
+void bar (int *);
+int foo (int x, int n)
+{
+  int i;
+  int a[128];
+
+  for (i = 0; i < n; i++)
+    {
+      a[i] = i;
+      if (x > i)
+       b[i] = y;
+    }
+  bar (a);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
new file mode 100644
index 0000000..bacf428
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */
+/* { dg-require-visibility "" } */
+
+int a[1024] = {0.0};
+int b[1024] = {0.0};
+int c[1024] = {0.0};
+int foo (float *x)
+{
+  int i = 0;
+
+  for (i = 0; i < 1024; i++)
+    {
+      c[i] = (x[i] > 0.0) ? a[i] : b[i];
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
new file mode 100644
index 0000000..89d42b4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */
+/* { dg-require-visibility "" } */
+
+struct st
+{
+  int a[1024];
+  int b[1024];
+};
+
+struct st s = {0};
+int foo (int x)
+{
+  int i;
+  struct st *p = &s;
+
+  for (i = 0; i < 1024; i++)
+    {
+      if (x > i)
+       p->a[i] = p->b[i];
+    }
+
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
new file mode 100644
index 0000000..24c19c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */
+/* { dg-require-visibility "" } */
+
+extern int b[256], y;
+void bar (int *, int);
+int foo (int x, int n)
+{
+  int i;
+  int a[128];
+
+  for (i = 0; i < n; i++)
+    {
+      a[i] = i;
+      if (x > i)
+       y = b[i];
+    }
+  bar (a, y);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */

Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

Reply via email to