On Thu, Aug 22, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote: > > Hi, > > Just wanted to ping this for more guidance.
It's difficult for me as long as I cannot investigate this with a testcase. Can we go ahead with the other parts so the testcase can be added and the issue reproduced? Richard. > Edwin > > On 7/24/2024 12:03 PM, Edwin Lu wrote: > > > > On 7/24/2024 3:52 AM, Richard Biener wrote: > >> On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu <e...@rivosinc.com> wrote: > >>> > >>> On 7/23/2024 11:20 AM, Richard Sandiford wrote: > >>>> Edwin Lu <e...@rivosinc.com> writes: > >>>>> On 7/23/2024 4:56 AM, Richard Biener wrote: > >>>>>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote: > >>>>>>> Hi Richard, > >>>>>>> > >>>>>>> On 5/31/2024 1:48 AM, Richard Biener wrote: > >>>>>>>> On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill > >>>>>>>> <patr...@rivosinc.com> wrote: > >>>>>>>>> From: Greg McGary <g...@rivosinc.com> > >>>>>>>> Still a NACK. If remain ends up zero then > >>>>>>>> > >>>>>>>> /* Try to use a single smaller load when > >>>>>>>> we are about > >>>>>>>> to load excess elements compared to > >>>>>>>> the unrolled > >>>>>>>> scalar loop. */ > >>>>>>>> if (known_gt ((vec_num * j + i + 1) * > >>>>>>>> nunits, > >>>>>>>> (group_size * vf - > >>>>>>>> gap))) > >>>>>>>> { > >>>>>>>> poly_uint64 remain = ((group_size * > >>>>>>>> vf - gap) > >>>>>>>> - (vec_num * j > >>>>>>>> + i) * nunits); > >>>>>>>> if (known_ge ((vec_num * j + i + 1) > >>>>>>>> * nunits > >>>>>>>> - (group_size * vf - > >>>>>>>> gap), nunits)) > >>>>>>>> /* DR will be unused. */ > >>>>>>>> ltype = NULL_TREE; > >>>>>>>> > >>>>>>>> needs to be re-formulated so that the combined conditions make sure > >>>>>>>> this doesn't happen. The outer known_gt should already ensure that > >>>>>>>> remain > 0. For correctness that should possibly be maybe_gt > >>>>>>>> though. > >>>> Yeah. FWIW, I mentioned the maybe_gt thing in > >>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html: > >>>> > >>>> Pre-existing, but shouldn't this be maybe_gt rather than known_gt? > >>>> We can only skip doing it if we know for sure that the load > >>>> won't cross > >>>> the gap. (Not sure whether the difference can trigger in > >>>> practice.) > >>>> > >>>> But AFAICT, the known_gt doesn't inherently prove that remain is known > >>>> to be nonzero. It just proves that the gap between the end of the > >>>> scalar > >>>> accesses and the end of this vector is known to be nonzero. > >>>> > >>>> [switching round for easier reply] > [removed some clarification questions about poly_int and vector > representations] > >>>>>> What is j and i when the divisor is zero? > >>>>> The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = > >>>>> {2, > >>>>> 2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2 > >>>>> remain = {coeffs = {0, 2}} > >>>> OK, so let's use D to mean "data" and G to mean "gap". Then, for the > >>>> minimum vector length of 2 elements, we have: > >>>> > >>>> DD GG DD GG > >>>> > >>>> The last load will read beyond the scalar loop if the vector loop > >>>> happens > >>>> to handle all elements of the scalar loop. > >>>> > >>>> For a vector length of 4 elements, we have: > >>>> > >>>> DDGG DDGG DDGG DDGG > >>>> > >>>> where every load contains both data and gaps. The same will be true > >>>> for larger vectors. > >>>> > >>>> That's where remain={0,2} is coming from. The last load is fully > >>>> redundant > >>>> for the minimum VL but not for larger VL. > >>>> Based on that, the patch below looks correct to me, but I might have > >>>> misunderstood the intent. > >>> As an alternative to the original patch, would this also make sense? > >>> > >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > >>> index aab3aa59962..cd657ac63af 100644 > >>> --- a/gcc/tree-vect-stmts.cc > >>> +++ b/gcc/tree-vect-stmts.cc > >>> @@ -11479,7 +11479,7 @@ vectorizable_load (vec_info *vinfo, > >>> /* Try to use a single smaller load when we are > >>> about > >>> to load excess elements compared to the > >>> unrolled > >>> scalar loop. */ > >>> - if (known_gt ((vec_num * j + i + 1) * nunits, > >>> + if (maybe_gt ((vec_num * j + i + 1) * nunits, > >>> (group_size * vf - gap))) > >>> { > >>> poly_uint64 remain = ((group_size * vf - gap) > >> That's a good point - this should indeed be maybe_gt > >> > >>> @@ -11502,6 +11502,10 @@ vectorizable_load (vec_info *vinfo, > >>> /* Aligned access to the gap area when > >>> there's > >>> at least one element in it is OK. */ > >>> ; > >>> + else if (maybe_eq (remain, 0)) > >>> + /* Handle remain.coeffs[0] == 0 case. > >>> Number of > >>> + elements is an exact multiple of vector > >>> length. */ > >>> + ; > >> Hmm, but here it should be known_eq? And why doesn't > >> constant_multiple_p work then? I think remain.coeffs[1] is never > >> negative here, but that would be another way to arrive at zero. > > > > If I use known_eq, the condition would be skipped and we'll enter the > > final else statement which triggers the floating point exception in > > constant_multiple_p. > > > > I don't know what it would mean for remain.coeffs[1] to be negative. > > To me it doesn't really make sense given my understanding that > > it represents the number of additional chunks beyond the minimum vector > > size. I think it would make sense to assert remain.coeffs[1] >= 0? > > > > I believe it would need to be maybe_eq since if remain.coeffs[1] is > > non-negative, the only way for remain == 0 is if X == 0 and > > remain.coeffs[0] == 0. > > > >> So in principle this would amend > >> > >> if (known_ge ((vec_num * j + i + 1) * nunits > >> - (group_size * vf - gap), > >> nunits)) > >> /* DR will be unused. */ > >> ltype = NULL_TREE; > >> > >> ? > > > > If we know remain == 0, then yes, I think it does amend that statement. > > However, since we don't know what X is, we aren't guaranteed that the load > > will eventually not be used. > > > >> Sure the original patch would work, but then the corresponding change > >> in get_group_loadstore_type needs to be done to ensure if remain > >> maybe zero we're forcing peeling for gaps, no? > > > > If remain may be zero, then wouldn't all the data already be loaded into > > vectors during the vector loop? Would we still need to peel for gaps > > in the scalar loop if all the data is processed? > > > > I'm not sure if my understanding is correct so please forgive me if > > this doesn't make sense. > > > >>> else > >>> { > >>> /* remain should now be > 0 and < > >>> nunits. */ > >>> > >>>> Thanks, > >>>> Richard > > > > Edwin > > > >>>>>>>>> gcc/ChangeLog: > >>>>>>>>> * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): > >>>>>>>>> Prevent divide-by-zero. > >>>>>>>>> * > >>>>>>>>> testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove > >>>>>>>>> dg-ice. > >>>>>>>>> --- > >>>>>>>>> No changes in v3. Depends on the risc-v backend option added in > >>>>>>>>> patch 1 to > >>>>>>>>> trigger the ICE. > >>>>>>>>> --- > >>>>>>>>> gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 - > >>>>>>>>> gcc/tree-vect-stmts.cc | > >>>>>>>>> 3 ++- > >>>>>>>>> 2 files changed, 2 insertions(+), 2 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git > >>>>>>>>> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c > >>>>>>>>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c > >>>>>>>>> index dfbe09f01a1..79d03612a22 100644 > >>>>>>>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c > >>>>>>>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c > >>>>>>>>> @@ -1,6 +1,5 @@ > >>>>>>>>> /* { dg-do compile } */ > >>>>>>>>> /* { dg-options "-march=rv64gcv -mabi=lp64d > >>>>>>>>> -mrvv-vector-bits=scalable -O3 -mno-autovec-segment" } */ > >>>>>>>>> -/* { dg-ice "Floating point exception" } */ > >>>>>>>>> > >>>>>>>>> enum e { c, d }; > >>>>>>>>> enum g { f }; > >>>>>>>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > >>>>>>>>> index 4219ad832db..34f5736ba00 100644 > >>>>>>>>> --- a/gcc/tree-vect-stmts.cc > >>>>>>>>> +++ b/gcc/tree-vect-stmts.cc > >>>>>>>>> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo, > >>>>>>>>> - (vec_num * j + i) * > >>>>>>>>> nunits); > >>>>>>>>> /* remain should now be > 0 and > >>>>>>>>> < nunits. */ > >>>>>>>>> unsigned num; > >>>>>>>>> - if (constant_multiple_p (nunits, > >>>>>>>>> remain, &num)) > >>>>>>>>> + if (known_gt (remain, 0) > >>>>>>>>> + && constant_multiple_p (nunits, > >>>>>>>>> remain, &num)) > >>>>>>>>> { > >>>>>>>>> tree ptype; > >>>>>>>>> new_vtype > >