On Thu, Nov 17, 2016 at 10:53 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Nov 17, 2016 at 11:26 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Thu, Nov 17, 2016 at 8:32 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Wed, Nov 16, 2016 at 6:20 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>> Hi,
>>>> Currently test gfortran.dg/vect/fast-math-mgrid-resid.f checks all 
>>>> predictive commoning opportunities for all possible loops.  This makes it 
>>>> fragile because vectorizer may peel the loop differently, as well as may 
>>>> choose different vector factors.  For example, on x86-solaris, vectorizer 
>>>> doesn't peel for prologue loop; for -march=haswell, the case is long time 
>>>> failed because vector factor is 4, while iteration distance of predictive 
>>>> commoning opportunity is smaller than 4.  This patch refines it by only 
>>>> checking if predictive commoning variable is created when vector factor is 
>>>> 2; or vectorization variable is created when factor is 4.  This works 
>>>> since we have only one main loop, and only one vector factor can be used.
>>>> Test result checked for various x64 targets.  Is it OK?
>>>
>>> I think that as you write the test is somewhat fragile.  But rather
>>> than adjusting the scanning like you do
>>> I'd add --param vect-max-peeling-for-alignment=0 and -mprefer-avx128
>> In this way, is it better to add "--param
>> vect-max-peeling-for-alignment=0" for all targets?  Otherwise we still
>> need to differentiate test string to handle different targets.  But I
>> have another question here: what if a target can't handle unaligned
>> access and vectorizer have to peel for alignment for it?
>
> You'd get versioning for alignment instead.
>
>> Also do you think it's ok to check predictive commoning PHI node as below?
>> # vectp_u.122__lsm0.158_94 = PHI <vectp_u.122__lsm0.158_95(8), _96(6)>
>> In this way, we don't need to take possible prologue/epilogue loops
>> into consideration.
>
> I hoped w/o peeling we can simply scan for "Executing predictive commoning".
> But with versioning for alignment you'd still get two loops.
>
> So maybe checking for both "Executing predictive commoning" and looking
> for a vect_lsm PHI node is ok...
Understood.  Here is the update patch.  Test result checked on x86_64
and aarch64.  Is it OK?

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-15  Bin Cheng  <bin.ch...@arm.com>

    PR testsuite/78114
    * gfortran.dg/vect/fast-math-mgrid-resid.f: Add additional
    options.  Refine test by checking predictive commining PHI
    nodes in vectorized loop wrto vector factor.

>
>>> as additional option on x86_64-*-* i?86-*-*.
>>>
>>> Your new pattern would fail with avx512 if vector (8) real would be used.
>>>
>>> What's the actual change that made the testcase fail btw?
>> There are two cases.
>> A) After vect_do_peeling change, vectorizer may only peel one
>> iteration for prologue loop (if vf == 2), below test string was added
>> for this reason:
>> ! { dg-final { scan-tree-dump-times "Loop iterates only 1 time,
>> nothing to do" 1 "pcom" } }
>> This fails on x86_64 solaris because prologue loop is not peeled at all.
>> B) Depending on ilp, I think below test strings fail for long time with 
>> haswell:
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 1 "pcom" { target lp64 } } }
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 2 "pcom" { target ia32 } } }
>> Because vectorizer choose vf==4 in this case, and there is no
>> predictive commoning opportunities at all.
>
> Yes.  I suggest -mprefer-avx128 for that.
>
>> Also the newly added test string fails in this case too because the
>> prolog peeled iterates more than 1 times.
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2016-11-16  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>         PR testsuite/78114
>>>>         * gfortran.dg/vect/fast-math-mgrid-resid.f: Refine test by
>>>>         checking predictive commining variables in vectorized loop
>>>>         wrto vector factor.
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 88238f9..293cac9 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -1,7 +1,7 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_double }
-! { dg-options "-O3 -fpredictive-commoning -fdump-tree-pcom-details" }
-
+! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details" }
+! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } 
}
 
 ******* RESID COMPUTES THE RESIDUAL:  R = V - AU
 *
@@ -38,8 +38,8 @@ C
       RETURN
       END
 ! we want to check that predictive commoning did something on the
-! vectorized loop.
-! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 1 "pcom" { target lp64 } } }
-! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 2 "pcom" { target ia32 } } }
-! { dg-final { scan-tree-dump-times "Predictive commoning failed: no suitable 
chains" 0 "pcom" } }
-! { dg-final { scan-tree-dump-times "Loop iterates only 1 time, nothing to do" 
1 "pcom" } }
+! vectorized loop.  If vector factor is 2, the vectorized loop can
+! be predictive commoned, we check if predictive commoning variable
+! is created with vector(2) type.
+! { dg-final { scan-tree-dump "Executing predictive commoning without 
unrolling" "pcom" } }
+! { dg-final { scan-tree-dump "vectp_u.*__lsm.* = PHI <.*vectp_u.*__lsm" 
"pcom" } }

Reply via email to