Hi, > -----Original Message----- > From: Richard Biener [mailto:rguent...@suse.de] > Sent: Tuesday, June 30, 2020 10:50 PM > To: Richard Sandiford <richard.sandif...@arm.com> > Cc: Richard Biener <richard.guent...@gmail.com>; Yangfei (Felix) > <felix.y...@huawei.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182 > > On June 30, 2020 4:23:03 PM GMT+02:00, Richard Sandiford > <richard.sandif...@arm.com> wrote: > >Richard Biener <rguent...@suse.de> writes: > >> On Tue, 30 Jun 2020, Richard Sandiford wrote: > >> > >>> Richard Biener <richard.guent...@gmail.com> writes: > >>> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford > >>> > <richard.sandif...@arm.com> wrote: > >>> >> > >>> >> "Yangfei (Felix)" <felix.y...@huawei.com> writes: > >>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c > >b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c > >>> >> > index e050db1a2e4..ea39fcac0e0 100644 > >>> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c > >>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c > >>> >> > @@ -1,6 +1,7 @@ > >>> >> > /* { dg-do compile } */ > >>> >> > /* { dg-additional-options "-O3" } */ > >>> >> > /* { dg-additional-options "-mavx2" { target { i?86-*-* > >x86_64-*-* } } } */ > >>> >> > +/* { dg-additional-options "-march=armv8.2-a+sve > >-fno-vect-cost-model" { target aarch64*-*-* } } */ > >>> >> > > >>> >> > typedef struct { > >>> >> > unsigned short mprr_2[5][16][16]; > >>> >> > >>> >> This test is useful for Advanced SIMD too, so I think we should > >continue > >>> >> to test it with whatever options the person running the testsuite > >chose. > >>> >> Instead we could duplicate the test in gcc.target/aarch64/sve > >with > >>> >> appropriate options. > >>> >> > >>> >> > diff --git a/gcc/tree-vect-data-refs.c > >b/gcc/tree-vect-data-refs.c > >>> >> > index eb8288e7a85..b30a7d8a3bb 100644 > >>> >> > --- a/gcc/tree-vect-data-refs.c > >>> >> > +++ b/gcc/tree-vect-data-refs.c > >>> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment > >(loop_vec_info loop_vinfo) > >>> >> > { > >>> >> > poly_uint64 nscalars = (STMT_SLP_TYPE > >(stmt_info) > >>> >> > ? vf * DR_GROUP_SIZE > >(stmt_info) : vf); > >>> >> > - possible_npeel_number > >>> >> > - = vect_get_num_vectors (nscalars, vectype); > >>> >> > + if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS > >(vectype))) > >>> >> > + possible_npeel_number = 0; > >>> >> > + else > >>> >> > + possible_npeel_number > >>> >> > + = vect_get_num_vectors (nscalars, vectype); > >>> >> > > >>> >> > /* NPEEL_TMP is 0 when there is no > >misalignment, but also > >>> >> > allow peeling NELEMENTS. */ > >>> >> > >>> >> OK, so this is coming from: > >>> >> > >>> >> int s[16][2]; > >>> >> … > >>> >> … =s[j][1]; > >>> >> > >>> >> and an SLP node containing 16 instances of “s[j][1]”. The > >DR_GROUP_SIZE > >>> >> is 2 because that's the inner dimension of “s”. > >>> >> > >>> >> I don't think maybe_lt is right here though. The same problem > >could in > >>> >> principle happen for cases in which NSCALARS > > >TYPE_VECTOR_SUBPARTS, > >>> >> e.g. for different inner dimensions of “s”. > >>> >> > >>> >> I think the testcase shows that using DR_GROUP_SIZE in this > >calculation > >>> >> is flawed. I'm not sure whether we can really do better given > >the current > >>> >> representation though. This is one case where having a separate > >dr_vec_info > >>> >> per SLP node would help. > >>> > > >>> > I guess what the code likes to know is what we now have in > >SLP_TREE_LANES > >>> > (or formerly group_size) but that's not necessarily connected to > >DR_GROUP_SIZE. > >>> > Given we only see a stmt_info here and there's no 1:1 mapping of > >SLP node > >>> > to stmt_info (and the reverse mapping doesn't even exist) I do not > >have > >>> > any good idea either. > >>> > > >>> > Honestly I don't really see what this code tries to do ... doesn't > >it > >>> > simply want to set possible_npeel_number to > TYPE_VECTOR_SUBPARTS > >(vectype)?! > >>> > >>> I think it's trying to set possible_npeel_number to the number of > >>> vector stmts per iteration (i.e. ncopies for non-SLP stuff): > >>> > >>> /* For multiple types, it is possible that the bigger > >type access > >>> will have more than one peeling option. E.g., a > >loop with two > >>> types: one of size (vector size / 4), and the other > >one of > >>> size (vector size / 8). Vectorization factor will > >8. If both > >>> accesses are misaligned by 3, the first one needs > >one scalar > >>> iteration to be aligned, and the second one needs > >5. But the > >>> first one will be aligned also by peeling 5 scalar > >>> iterations, and in that case both accesses will be > >aligned. > >>> Hence, except for the immediate peeling amount, we > >also want > >>> to try to add full vector size, while we don't > >exceed > >>> vectorization factor. > >>> We do this automatically for cost model, since we > >calculate > >>> cost for every peeling option. */ > >>> > >>> So for this example, possible_npeel_number==2 for the first access > >>> (letting us try two peeling values for it) and > >possible_npeel_number==1 > >>> for the second. > >> > >> Yeah, but npeel is _scalar_ lanes, since we always peel scalar > >iterations. > >> So it seems odd to somehow put in the number of vectors... so to me > >> it would have made sense if it did > >> > >> possible_npeel_number = lower_bound (nscalars); > >> > >> or whateveris necessary to make the polys happy. Thus simply elide > >> the vect_get_num_vectors call? But it's been very long since I've > >> dived into the alignment peeling code... > > > >Ah, I see what you mean. So rather than: > > > > /* Save info about DR in the hash table. Also include peeling > > amounts according to the explanation above. */ > > for (j = 0; j < possible_npeel_number; j++) > > { > > vect_peeling_hash_insert (&peeling_htab, loop_vinfo, > > dr_info, npeel_tmp); > > npeel_tmp += target_align / dr_size; > > } > > > >just have something like: > > > > while (known_le (npeel_tmp, nscalars)) > > { > > … > > } > > > >? > > Yeah.
Not sure if I understand correctly. I am supposing the following check in the original code is not necessary if we go like that. 1822 if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) Is that correct? Thanks, Felix