On Tue, May 3, 2016 at 11:08 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Mon, May 2, 2016 at 10:00 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> Hi, >>>>>> Tree if-conversion sometimes cannot convert conditional array reference >>>>>> into unconditional one. Root cause is GCC conservatively assumes newly >>>>>> introduced array reference could be out of array bound and thus >>>>>> trapping. This patch improves the situation by proving the converted >>>>>> unconditional array reference is within array bound using loop niter >>>>>> information. To be specific, it checks every index of array reference >>>>>> to see if it's within bound in ifcvt_memrefs_wont_trap. This patch also >>>>>> factors out base_object_writable checking if the base object is writable >>>>>> or not. >>>>>> Bootstrap and test on x86_64 and aarch64, is it OK? >>>>> >>>>> I think you miss to handle the case optimally where the only >>>>> non-ARRAY_REF idx is the dereference of the >>>>> base-pointer for, say, p->a[i]. In this case we can use >>>>> base_master_dr to see if p is unconditionally dereferenced >>>> Yes, will pick up this case. >>>> >>>>> in the loop. You also fail to handle the case where we have >>>>> MEM_REF[&x].a[i] that is, you see a decl base. >>>> I am having difficulty in creating this case for ifcvt, any advices? >>>> Thanks. >>> >>> Sth like >>> >>> float a[128]; >>> float foo (int n, int i) >>> { >>> return (*((float(*)[n])a))[i]; >>> } >>> >>> should do the trick (w/o the component-ref). Any other type-punning >>> would do it, too. >>> >>>>> I suppose for_each_index should be fixed for this particular case (to >>>>> return true), same for TARGET_MEM_REF TMR_BASE. >>>>> >>>>> + /* The case of nonconstant bounds could be handled, but it would be >>>>> + complicated. */ >>>>> + if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low) >>>>> + || !high || TREE_CODE (high) != INTEGER_CST) >>>>> + return false; >>>>> + >>>>> >>>>> handling of a non-zero but constant low bound is important - otherwise >>>>> all this is a no-op for Fortran. It >>>>> shouldn't be too difficult to handle after all. In fact I think your >>>>> code does handle it correctly already. >>>>> >>>>> + if (!init || TREE_CODE (init) != INTEGER_CST >>>>> + || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop >>>>> (step)) >>>>> + return false; >>>>> >>>>> step == 0 should be easy to handle as well, no? The index will simply >>>>> always be 'init' ... >>>>> >>>>> + /* In case the relevant bound of the array does not fit in type, or >>>>> + it does, but bound + step (in type) still belongs into the range of >>>>> the >>>>> + array, the index may wrap and still stay within the range of the >>>>> array >>>>> + (consider e.g. if the array is indexed by the full range of >>>>> + unsigned char). >>>>> + >>>>> + To make things simpler, we require both bounds to fit into type, >>>>> although >>>>> + there are cases where this would not be strictly necessary. */ >>>>> + if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type)) >>>>> + return false; >>>>> + >>>>> + low = fold_convert (type, low); >>>>> >>>>> please use wide_int for all of this. >>>> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not >>>> sure what's the meaning by "handle "low = fold_convert (type, low);" >>>> related code in wide_int". Do you mean to use tree_int_cst_compare >>>> instead of tree_int_cst_compare in the following code? >>> >>> I don't think you need any kind of fits-to-type check here. You'd simply >>> use to_widest () when operating on / comparing with high/low. >> But what would happen if low/high and init/step are different in type >> sign-ness? Anything special I need to do before using wi::ltu_p or >> wi::lts_p directly? > > You want to use to_widest (min) which extends according to sign to > an "infinite" precision signed integer. So you can then use the new > operator< overloads as well. > Hi, Here is the updated patch. It includes below changes according to review comments:
1) It uses widest_int for all INTEGER_CST tree computations, which simplifies the patch a lot. 2) It covers array with non-zero low bound, which is important for Fortran. 3) It picks up a boundary case so that ifc-11.c/vect-23.c/vect-24.c can be handled. 4) It also checks within bound array reference inside a structure like p->a[i] by using base_master_dr in tree-if-conv.c so that ifc-12.c can be handled. It leaves two other review comments not addressed: 1) It doesn't handle array reference whose idx is a wrapping SCEV. Because only read is safe and vectorizer itself may be confused by it now. 2) It doesn't handle array reference in the form of "MEM[(float[0:(sizetype) ((long int) SAVE_EXPR <m.2> + -1)] *)&b][i_1];". To handle this case, existing code in array_at_struct_end_p as well as this patch need to be improved. I tend to handle it in an independent patch. With these changes, now cases pr61194.c and vect-23.c can be vectorized, I removed XFAIL from them. Also vect-mask-store-move-1.c is affected and not vectorized now, this is tricky and it exposes a known bug PR65206 in vectorizer's dependence analyzer. This should be handled independently too. Also gcc.dg/vect/vect-24.c now is XPASS on AArch64, but not x86_64. Root cause is dom2 jump threads first iteration of loop thus idx_within_array_bound is failed. I didn't check if jump threading is good in this case, either I remove the XFAIL mark. I tend to improve idx_within_array_bound by using VRP information in a following patch. Otherwise bootstrap and test on x86_64 and AArch64 are fine. Is this version OK? Thanks, bin 2016-05-04 Bin Cheng <bin.ch...@arm.com> * tree-if-conv.c (tree-ssa-loop.h): Include header file. (tree-ssa-loop-niter.h): Ditto. (idx_within_array_bound, ref_within_array_bound): New functions. (ifcvt_memrefs_wont_trap): Check if array ref is within bound. Factor out check on writable base object to ... (base_object_writable): ... here. gcc/testsuite/ChangeLog 2016-05-04 Bin Cheng <bin.ch...@arm.com> * gcc.dg/tree-ssa/ifc-9.c: New test. * gcc.dg/tree-ssa/ifc-10.c: New test. * gcc.dg/tree-ssa/ifc-11.c: New test. * gcc.dg/tree-ssa/ifc-12.c: New test. * gcc.dg/vect/pr61194.c: Remove XFAIL. * gcc.dg/vect/vect-23.c: Remove XFAIL. * gcc.dg/vect/vect-mask-store-move-1.c: Revise test check.
diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c b/gcc/testsuite/gcc.dg/vect/pr61194.c index 8d74e00..f7c71b9 100644 --- a/gcc/testsuite/gcc.dg/vect/pr61194.c +++ b/gcc/testsuite/gcc.dg/vect/pr61194.c @@ -38,4 +38,4 @@ int main() return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c b/gcc/testsuite/gcc.dg/vect/vect-23.c index 44bed75..e463f1b 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-23.c +++ b/gcc/testsuite/gcc.dg/vect/vect-23.c @@ -123,5 +123,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-24.c b/gcc/testsuite/gcc.dg/vect/vect-24.c index 09a6d7e..d27410b 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-24.c +++ b/gcc/testsuite/gcc.dg/vect/vect-24.c @@ -123,5 +123,5 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c index f5cae4f..f928dbf 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c @@ -15,4 +15,4 @@ void foo (int n) } } -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" { target { i?86-*-* x86_64-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 4 "vect" { target { i?86-*-* x86_64-*-* } } } } */ diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 32ced16..5527a7c 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -106,6 +106,8 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "tree-data-ref.h" #include "tree-scalar-evolution.h" +#include "tree-ssa-loop.h" +#include "tree-ssa-loop-niter.h" #include "tree-ssa-loop-ivopts.h" #include "tree-ssa-address.h" #include "dbgcnt.h" @@ -740,6 +742,105 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) } } +/* Return TRUE if can prove the index IDX of an array reference REF is + within array bound. Return false otherwise. */ + +static bool +idx_within_array_bound (tree ref, tree *idx, void *dta) +{ + bool overflow; + widest_int niter, valid_niter, delta, wi_step; + tree ev, init, step; + tree low, high; + struct loop *loop = (struct loop*) dta; + + /* Only support within-bound access for array references. */ + if (TREE_CODE (ref) != ARRAY_REF) + return false; + + /* For arrays at the end of the structure, we are not guaranteed that they + do not really extend over their declared size. However, for arrays of + size greater than one, this is unlikely to be intended. */ + if (array_at_struct_end_p (ref)) + return false; + + ev = analyze_scalar_evolution (loop, *idx); + ev = instantiate_parameters (loop, ev); + init = initial_condition (ev); + step = evolution_part_in_loop_num (ev, loop->num); + + if (!init || TREE_CODE (init) != INTEGER_CST + || (step && TREE_CODE (step) != INTEGER_CST)) + return false; + + low = array_ref_low_bound (ref); + high = array_ref_up_bound (ref); + + /* The case of nonconstant bounds could be handled, but it would be + complicated. */ + if (TREE_CODE (low) != INTEGER_CST + || !high || TREE_CODE (high) != INTEGER_CST) + return false; + + /* Check if the intial idx is within bound. */ + if (wi::to_widest (init) < wi::to_widest (low) + || wi::to_widest (init) > wi::to_widest (high)) + return false; + + /* The idx is always within bound. */ + if (!step || integer_zerop (step)) + return true; + + if (!max_loop_iterations (loop, &niter)) + return false; + + if (wi::to_widest (step) < 0) + { + delta = wi::to_widest (init) - wi::to_widest (low); + wi_step = -wi::to_widest (step); + } + else + { + delta = wi::to_widest (high) - wi::to_widest (init); + wi_step = wi::to_widest (step); + } + + valid_niter = wi::div_floor (delta, wi_step, SIGNED, &overflow); + /* The iteration space of idx is within array bound. */ + if (!overflow && niter <= valid_niter) + return true; + + return false; +} + +/* Return TRUE if ref is a within bound array reference. */ + +static bool +ref_within_array_bound (gimple *stmt, tree ref) +{ + struct loop *loop = loop_containing_stmt (stmt); + + gcc_assert (loop != NULL); + return for_each_index (&ref, idx_within_array_bound, loop); +} + + +/* Given a memory reference expression T, return TRUE if base object + it refers to is writable. The base object of a memory reference + is the main object being referenced, which is returned by function + get_base_address. */ + +static bool +base_object_writable (tree ref) +{ + tree base_tree = get_base_address (ref); + + return (base_tree + && DECL_P (base_tree) + && decl_binds_to_current_def_p (base_tree) + && !TREE_READONLY (base_tree)); +} + /* Return true when the memory references of STMT won't trap in the if-converted code. There are two things that we have to check for: @@ -787,8 +888,13 @@ ifcvt_memrefs_wont_trap (gimple *stmt, vec<data_reference_p> drs) if (DR_W_UNCONDITIONALLY (*master_dr)) return true; - /* If a is unconditionally accessed then ... */ - if (DR_RW_UNCONDITIONALLY (*master_dr)) + /* If a is unconditionally accessed then ... + + Even a is conditional access, we can treat it as an unconditional + one if it's an array reference and all its index are within array + bound. */ + if (DR_RW_UNCONDITIONALLY (*master_dr) + || ref_within_array_bound (stmt, DR_REF (a))) { /* an unconditional read won't trap. */ if (DR_IS_READ (a)) @@ -799,16 +905,11 @@ ifcvt_memrefs_wont_trap (gimple *stmt, vec<data_reference_p> drs) if (base_master_dr && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); - else - { - /* or the base is know to be not readonly. */ - tree base_tree = get_base_address (DR_REF (a)); - if (DECL_P (base_tree) - && decl_binds_to_current_def_p (base_tree) - && ! TREE_READONLY (base_tree)) - return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); - } + /* or the base is known to be not readonly. */ + else if (base_object_writable (DR_REF (a))) + return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES); } + return false; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c new file mode 100644 index 0000000..70b7422 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */ +/* { dg-require-visibility "" } */ + +int b[256] = {0}, y; +void bar (int *); +int foo (int x, int n) +{ + int i; + int a[128]; + + for (i = 0; i < n; i++) + { + a[i] = i; + if (x > i) + b[i] = y; + } + bar (a); + return 0; +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c new file mode 100644 index 0000000..bacf428 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */ +/* { dg-require-visibility "" } */ + +int a[1024] = {0.0}; +int b[1024] = {0.0}; +int c[1024] = {0.0}; +int foo (float *x) +{ + int i = 0; + + for (i = 0; i < 1024; i++) + { + c[i] = (x[i] > 0.0) ? a[i] : b[i]; + } + + return 0; +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c new file mode 100644 index 0000000..89d42b4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */ +/* { dg-require-visibility "" } */ + +struct st +{ + int a[1024]; + int b[1024]; +}; + +struct st s = {0}; +int foo (int x) +{ + int i; + struct st *p = &s; + + for (i = 0; i < 1024; i++) + { + if (x > i) + p->a[i] = p->b[i]; + } + + return 0; +} +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c new file mode 100644 index 0000000..24c19c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats" } */ +/* { dg-require-visibility "" } */ + +extern int b[256], y; +void bar (int *, int); +int foo (int x, int n) +{ + int i; + int a[128]; + + for (i = 0; i < n; i++) + { + a[i] = i; + if (x > i) + y = b[i]; + } + bar (a, y); + return 0; +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */