Richard Biener <rguent...@suse.de> writes: > On Tue, 30 Jun 2020, Richard Sandiford wrote: > >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> "Yangfei (Felix)" <felix.y...@huawei.com> writes: >> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >> >> > b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >> >> > index e050db1a2e4..ea39fcac0e0 100644 >> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c >> >> > @@ -1,6 +1,7 @@ >> >> > /* { dg-do compile } */ >> >> > /* { dg-additional-options "-O3" } */ >> >> > /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } >> >> > } */ >> >> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" >> >> > { target aarch64*-*-* } } */ >> >> > >> >> > typedef struct { >> >> > unsigned short mprr_2[5][16][16]; >> >> >> >> This test is useful for Advanced SIMD too, so I think we should continue >> >> to test it with whatever options the person running the testsuite chose. >> >> Instead we could duplicate the test in gcc.target/aarch64/sve with >> >> appropriate options. >> >> >> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >> > index eb8288e7a85..b30a7d8a3bb 100644 >> >> > --- a/gcc/tree-vect-data-refs.c >> >> > +++ b/gcc/tree-vect-data-refs.c >> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info >> >> > loop_vinfo) >> >> > { >> >> > poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info) >> >> > ? vf * DR_GROUP_SIZE >> >> > (stmt_info) : vf); >> >> > - possible_npeel_number >> >> > - = vect_get_num_vectors (nscalars, vectype); >> >> > + if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype))) >> >> > + possible_npeel_number = 0; >> >> > + else >> >> > + possible_npeel_number >> >> > + = vect_get_num_vectors (nscalars, vectype); >> >> > >> >> > /* NPEEL_TMP is 0 when there is no misalignment, but >> >> > also >> >> > allow peeling NELEMENTS. */ >> >> >> >> OK, so this is coming from: >> >> >> >> int s[16][2]; >> >> … >> >> … =s[j][1]; >> >> >> >> and an SLP node containing 16 instances of “s[j][1]”. The DR_GROUP_SIZE >> >> is 2 because that's the inner dimension of “s”. >> >> >> >> I don't think maybe_lt is right here though. The same problem could in >> >> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS, >> >> e.g. for different inner dimensions of “s”. >> >> >> >> I think the testcase shows that using DR_GROUP_SIZE in this calculation >> >> is flawed. I'm not sure whether we can really do better given the current >> >> representation though. This is one case where having a separate >> >> dr_vec_info >> >> per SLP node would help. >> > >> > I guess what the code likes to know is what we now have in SLP_TREE_LANES >> > (or formerly group_size) but that's not necessarily connected to >> > DR_GROUP_SIZE. >> > Given we only see a stmt_info here and there's no 1:1 mapping of SLP node >> > to stmt_info (and the reverse mapping doesn't even exist) I do not have >> > any good idea either. >> > >> > Honestly I don't really see what this code tries to do ... doesn't it >> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS >> > (vectype)?! >> >> I think it's trying to set possible_npeel_number to the number of >> vector stmts per iteration (i.e. ncopies for non-SLP stuff): >> >> /* For multiple types, it is possible that the bigger type >> access >> will have more than one peeling option. E.g., a loop with >> two >> types: one of size (vector size / 4), and the other one of >> size (vector size / 8). Vectorization factor will 8. If >> both >> accesses are misaligned by 3, the first one needs one scalar >> iteration to be aligned, and the second one needs 5. But >> the >> first one will be aligned also by peeling 5 scalar >> iterations, and in that case both accesses will be aligned. >> Hence, except for the immediate peeling amount, we also want >> to try to add full vector size, while we don't exceed >> vectorization factor. >> We do this automatically for cost model, since we calculate >> cost for every peeling option. */ >> >> So for this example, possible_npeel_number==2 for the first access >> (letting us try two peeling values for it) and possible_npeel_number==1 >> for the second. > > Yeah, but npeel is _scalar_ lanes, since we always peel scalar iterations. > So it seems odd to somehow put in the number of vectors... so to me > it would have made sense if it did > > possible_npeel_number = lower_bound (nscalars); > > or whateveris necessary to make the polys happy. Thus simply elide > the vect_get_num_vectors call? But it's been very long since I've > dived into the alignment peeling code...
Ah, I see what you mean. So rather than: /* Save info about DR in the hash table. Also include peeling amounts according to the explanation above. */ for (j = 0; j < possible_npeel_number; j++) { vect_peeling_hash_insert (&peeling_htab, loop_vinfo, dr_info, npeel_tmp); npeel_tmp += target_align / dr_size; } just have something like: while (known_le (npeel_tmp, nscalars)) { … } ?