Richard Biener <rguent...@suse.de> writes:
> On Tue, 30 Jun 2020, Richard Sandiford wrote:
>
>> Richard Biener <richard.guent...@gmail.com> writes:
>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >>
>> >> "Yangfei (Felix)" <felix.y...@huawei.com> writes:
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c 
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > index e050db1a2e4..ea39fcac0e0 100644
>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
>> >> > @@ -1,6 +1,7 @@
>> >> >  /* { dg-do compile } */
>> >> >  /* { dg-additional-options "-O3" } */
>> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } 
>> >> > } */
>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" 
>> >> > { target aarch64*-*-* } } */
>> >> >
>> >> >  typedef struct {
>> >> >      unsigned short mprr_2[5][16][16];
>> >>
>> >> This test is useful for Advanced SIMD too, so I think we should continue
>> >> to test it with whatever options the person running the testsuite chose.
>> >> Instead we could duplicate the test in gcc.target/aarch64/sve with
>> >> appropriate options.
>> >>
>> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> > index eb8288e7a85..b30a7d8a3bb 100644
>> >> > --- a/gcc/tree-vect-data-refs.c
>> >> > +++ b/gcc/tree-vect-data-refs.c
>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
>> >> > loop_vinfo)
>> >> >               {
>> >> >                 poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
>> >> >                                         ? vf * DR_GROUP_SIZE 
>> >> > (stmt_info) : vf);
>> >> > -               possible_npeel_number
>> >> > -                 = vect_get_num_vectors (nscalars, vectype);
>> >> > +               if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
>> >> > +                 possible_npeel_number = 0;
>> >> > +               else
>> >> > +                 possible_npeel_number
>> >> > +                   = vect_get_num_vectors (nscalars, vectype);
>> >> >
>> >> >                 /* NPEEL_TMP is 0 when there is no misalignment, but 
>> >> > also
>> >> >                    allow peeling NELEMENTS.  */
>> >>
>> >> OK, so this is coming from:
>> >>
>> >>   int s[16][2];
>> >>   …
>> >>   … =s[j][1];
>> >>
>> >> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
>> >> is 2 because that's the inner dimension of “s”.
>> >>
>> >> I don't think maybe_lt is right here though.  The same problem could in
>> >> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
>> >> e.g. for different inner dimensions of “s”.
>> >>
>> >> I think the testcase shows that using DR_GROUP_SIZE in this calculation
>> >> is flawed.  I'm not sure whether we can really do better given the current
>> >> representation though.  This is one case where having a separate 
>> >> dr_vec_info
>> >> per SLP node would help.
>> >
>> > I guess what the code likes to know is what we now have in SLP_TREE_LANES
>> > (or formerly group_size) but that's not necessarily connected to 
>> > DR_GROUP_SIZE.
>> > Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
>> > to stmt_info (and the reverse mapping doesn't even exist) I do not have
>> > any good idea either.
>> >
>> > Honestly I don't really see what this code tries to do ... doesn't it
>> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS 
>> > (vectype)?!
>> 
>> I think it's trying to set possible_npeel_number to the number of
>> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
>> 
>>               /* For multiple types, it is possible that the bigger type 
>> access
>>                  will have more than one peeling option.  E.g., a loop with 
>> two
>>                  types: one of size (vector size / 4), and the other one of
>>                  size (vector size / 8).  Vectorization factor will 8.  If 
>> both
>>                  accesses are misaligned by 3, the first one needs one scalar
>>                  iteration to be aligned, and the second one needs 5.  But 
>> the
>>                  first one will be aligned also by peeling 5 scalar
>>                  iterations, and in that case both accesses will be aligned.
>>                  Hence, except for the immediate peeling amount, we also want
>>                  to try to add full vector size, while we don't exceed
>>                  vectorization factor.
>>                  We do this automatically for cost model, since we calculate
>>                  cost for every peeling option.  */
>> 
>> So for this example, possible_npeel_number==2 for the first access
>> (letting us try two peeling values for it) and possible_npeel_number==1
>> for the second.
>
> Yeah, but npeel is _scalar_ lanes, since we always peel scalar iterations.
> So it seems odd to somehow put in the number of vectors...  so to me
> it would have made sense if it did
>
>   possible_npeel_number = lower_bound (nscalars);
>
> or whateveris necessary to make the polys happy.  Thus simply elide
> the vect_get_num_vectors call?  But it's been very long since I've
> dived into the alignment peeling code...

Ah, I see what you mean.  So rather than:

              /* Save info about DR in the hash table.  Also include peeling
                 amounts according to the explanation above.  */
              for (j = 0; j < possible_npeel_number; j++)
                {
                  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
                                            dr_info, npeel_tmp);
                  npeel_tmp += target_align / dr_size;
                }

just have something like:

              while (known_le (npeel_tmp, nscalars))
                {
                  …
                }

?

Reply via email to