On Thu, Nov 17, 2016 at 10:53 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Nov 17, 2016 at 11:26 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, Nov 17, 2016 at 8:32 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Wed, Nov 16, 2016 at 6:20 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> Currently test gfortran.dg/vect/fast-math-mgrid-resid.f checks all >>>> predictive commoning opportunities for all possible loops. This makes it >>>> fragile because vectorizer may peel the loop differently, as well as may >>>> choose different vector factors. For example, on x86-solaris, vectorizer >>>> doesn't peel for prologue loop; for -march=haswell, the case is long time >>>> failed because vector factor is 4, while iteration distance of predictive >>>> commoning opportunity is smaller than 4. This patch refines it by only >>>> checking if predictive commoning variable is created when vector factor is >>>> 2; or vectorization variable is created when factor is 4. This works >>>> since we have only one main loop, and only one vector factor can be used. >>>> Test result checked for various x64 targets. Is it OK? >>> >>> I think that as you write the test is somewhat fragile. But rather >>> than adjusting the scanning like you do >>> I'd add --param vect-max-peeling-for-alignment=0 and -mprefer-avx128 >> In this way, is it better to add "--param >> vect-max-peeling-for-alignment=0" for all targets? Otherwise we still >> need to differentiate test string to handle different targets. But I >> have another question here: what if a target can't handle unaligned >> access and vectorizer have to peel for alignment for it? > > You'd get versioning for alignment instead. > >> Also do you think it's ok to check predictive commoning PHI node as below? >> # vectp_u.122__lsm0.158_94 = PHI <vectp_u.122__lsm0.158_95(8), _96(6)> >> In this way, we don't need to take possible prologue/epilogue loops >> into consideration. > > I hoped w/o peeling we can simply scan for "Executing predictive commoning". > But with versioning for alignment you'd still get two loops. > > So maybe checking for both "Executing predictive commoning" and looking > for a vect_lsm PHI node is ok... Understood. Here is the update patch. Test result checked on x86_64 and aarch64. Is it OK?
Thanks, bin gcc/testsuite/ChangeLog 2016-11-15 Bin Cheng <bin.ch...@arm.com> PR testsuite/78114 * gfortran.dg/vect/fast-math-mgrid-resid.f: Add additional options. Refine test by checking predictive commining PHI nodes in vectorized loop wrto vector factor. > >>> as additional option on x86_64-*-* i?86-*-*. >>> >>> Your new pattern would fail with avx512 if vector (8) real would be used. >>> >>> What's the actual change that made the testcase fail btw? >> There are two cases. >> A) After vect_do_peeling change, vectorizer may only peel one >> iteration for prologue loop (if vf == 2), below test string was added >> for this reason: >> ! { dg-final { scan-tree-dump-times "Loop iterates only 1 time, >> nothing to do" 1 "pcom" } } >> This fails on x86_64 solaris because prologue loop is not peeled at all. >> B) Depending on ilp, I think below test strings fail for long time with >> haswell: >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 1 "pcom" { target lp64 } } } >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 2 "pcom" { target ia32 } } } >> Because vectorizer choose vf==4 in this case, and there is no >> predictive commoning opportunities at all. > > Yes. I suggest -mprefer-avx128 for that. > >> Also the newly added test string fails in this case too because the >> prolog peeled iterates more than 1 times. >> >> Thanks, >> bin >>> >>> Richard. >>> >>>> Thanks, >>>> bin >>>> >>>> gcc/testsuite/ChangeLog >>>> 2016-11-16 Bin Cheng <bin.ch...@arm.com> >>>> >>>> PR testsuite/78114 >>>> * gfortran.dg/vect/fast-math-mgrid-resid.f: Refine test by >>>> checking predictive commining variables in vectorized loop >>>> wrto vector factor.
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f index 88238f9..293cac9 100644 --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f @@ -1,7 +1,7 @@ ! { dg-do compile } ! { dg-require-effective-target vect_double } -! { dg-options "-O3 -fpredictive-commoning -fdump-tree-pcom-details" } - +! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 -fpredictive-commoning -fdump-tree-pcom-details" } +! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } } ******* RESID COMPUTES THE RESIDUAL: R = V - AU * @@ -38,8 +38,8 @@ C RETURN END ! we want to check that predictive commoning did something on the -! vectorized loop. -! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling" 1 "pcom" { target lp64 } } } -! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling" 2 "pcom" { target ia32 } } } -! { dg-final { scan-tree-dump-times "Predictive commoning failed: no suitable chains" 0 "pcom" } } -! { dg-final { scan-tree-dump-times "Loop iterates only 1 time, nothing to do" 1 "pcom" } } +! vectorized loop. If vector factor is 2, the vectorized loop can +! be predictive commoned, we check if predictive commoning variable +! is created with vector(2) type. +! { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } +! { dg-final { scan-tree-dump "vectp_u.*__lsm.* = PHI <.*vectp_u.*__lsm" "pcom" } }