Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> "Yangfei (Felix)" <felix.y...@huawei.com> writes:
>> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
>> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > index e050db1a2e4..ea39fcac0e0 100644
>> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> > @@ -1,6 +1,7 @@
>> >  /* { dg-do compile } */
>> >  /* { dg-additional-options "-O3" } */
>> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } 
>> > */
>> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" { 
>> > target aarch64*-*-* } } */
>> >
>> >  typedef struct {
>> >      unsigned short mprr_2[5][16][16];
>>
>> This test is useful for Advanced SIMD too, so I think we should continue
>> to test it with whatever options the person running the testsuite chose.
>> Instead we could duplicate the test in gcc.target/aarch64/sve with
>> appropriate options.
>>
>> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> > index eb8288e7a85..b30a7d8a3bb 100644
>> > --- a/gcc/tree-vect-data-refs.c
>> > +++ b/gcc/tree-vect-data-refs.c
>> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
>> > loop_vinfo)
>> >               {
>> >                 poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
>> >                                         ? vf * DR_GROUP_SIZE (stmt_info) : 
>> > vf);
>> > -               possible_npeel_number
>> > -                 = vect_get_num_vectors (nscalars, vectype);
>> > +               if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
>> > +                 possible_npeel_number = 0;
>> > +               else
>> > +                 possible_npeel_number
>> > +                   = vect_get_num_vectors (nscalars, vectype);
>> >
>> >                 /* NPEEL_TMP is 0 when there is no misalignment, but also
>> >                    allow peeling NELEMENTS.  */
>>
>> OK, so this is coming from:
>>
>>   int s[16][2];
>>   …
>>   … =s[j][1];
>>
>> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
>> is 2 because that's the inner dimension of “s”.
>>
>> I don't think maybe_lt is right here though.  The same problem could in
>> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
>> e.g. for different inner dimensions of “s”.
>>
>> I think the testcase shows that using DR_GROUP_SIZE in this calculation
>> is flawed.  I'm not sure whether we can really do better given the current
>> representation though.  This is one case where having a separate dr_vec_info
>> per SLP node would help.
>
> I guess what the code likes to know is what we now have in SLP_TREE_LANES
> (or formerly group_size) but that's not necessarily connected to 
> DR_GROUP_SIZE.
> Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
> to stmt_info (and the reverse mapping doesn't even exist) I do not have
> any good idea either.
>
> Honestly I don't really see what this code tries to do ... doesn't it
> simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS (vectype)?!

I think it's trying to set possible_npeel_number to the number of
vector stmts per iteration (i.e. ncopies for non-SLP stuff):

              /* For multiple types, it is possible that the bigger type access
                 will have more than one peeling option.  E.g., a loop with two
                 types: one of size (vector size / 4), and the other one of
                 size (vector size / 8).  Vectorization factor will 8.  If both
                 accesses are misaligned by 3, the first one needs one scalar
                 iteration to be aligned, and the second one needs 5.  But the
                 first one will be aligned also by peeling 5 scalar
                 iterations, and in that case both accesses will be aligned.
                 Hence, except for the immediate peeling amount, we also want
                 to try to add full vector size, while we don't exceed
                 vectorization factor.
                 We do this automatically for cost model, since we calculate
                 cost for every peeling option.  */

So for this example, possible_npeel_number==2 for the first access
(letting us try two peeling values for it) and possible_npeel_number==1
for the second.

Thanks,
Richard

Reply via email to