Edwin Lu <e...@rivosinc.com> writes: > On 7/23/2024 4:56 AM, Richard Biener wrote: >> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote: >>> Hi Richard, >>> >>> On 5/31/2024 1:48 AM, Richard Biener wrote: >>>> On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill <patr...@rivosinc.com> >>>> wrote: >>>>> From: Greg McGary <g...@rivosinc.com> >>>> Still a NACK. If remain ends up zero then >>>> >>>> /* Try to use a single smaller load when we are about >>>> to load excess elements compared to the unrolled >>>> scalar loop. */ >>>> if (known_gt ((vec_num * j + i + 1) * nunits, >>>> (group_size * vf - gap))) >>>> { >>>> poly_uint64 remain = ((group_size * vf - gap) >>>> - (vec_num * j + i) * >>>> nunits); >>>> if (known_ge ((vec_num * j + i + 1) * nunits >>>> - (group_size * vf - gap), nunits)) >>>> /* DR will be unused. */ >>>> ltype = NULL_TREE; >>>> >>>> needs to be re-formulated so that the combined conditions make sure >>>> this doesn't happen. The outer known_gt should already ensure that >>>> remain > 0. For correctness that should possibly be maybe_gt though.
Yeah. FWIW, I mentioned the maybe_gt thing in https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html: Pre-existing, but shouldn't this be maybe_gt rather than known_gt? We can only skip doing it if we know for sure that the load won't cross the gap. (Not sure whether the difference can trigger in practice.) But AFAICT, the known_gt doesn't inherently prove that remain is known to be nonzero. It just proves that the gap between the end of the scalar accesses and the end of this vector is known to be nonzero. >>>> >> Putting the list back in the loop and CCing Richard S. >> >>> I'm currently looking into this patch and am trying to figure out what >>> is going on. Stepping through gdb, I see that remain == {coeffs = {0, >>> 2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true >>> with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})). >>> >>> From what I understand, this falls under the umbrella of 0 <= remain < >>> nunits. The divide by zero error is because of the 0 <= remain which is >>> coming from the constant_multiple_p function in poly-int.h where it >>> performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]). >>> (https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971) >>> >>> >>> > if (known_ge ((vec_num * j + i + 1) * nunits >>> > - (group_size * vf - gap), >>> nunits)) >>> > /* DR will be unused. */ >>> > ltype = NULL_TREE; >>> >>> This if condition is a bit suspicious to me though. I'm seeing that it's >>> evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is >>> returning false. Should it be maybe_ge instead? >> No, we can only not emit a load if we know it won't be used, not if >> it eventually cannot be used. Agreed. [switching round for easier reply] >>> After running some >>> tests, to me it looks like it doesn't vectorize quite as often; however, >>> I'm not fully sure what else to do when the coeffs can potentially be >>> equal to 0. >>> >>> Should it even be possible for there to be a {coeffs = {0, n}} >>> situation? My understanding of how poly_ints are used for representing >>> vectorization is that the first coefficient is the number of elements >>> needed to make the minimum supported vector size. That is, if vector >>> lengths are 128 bits, element size is 32 bits, coeff[0] should be >>> minimum of 4. Is this understanding correct? >> I was told n can be negative, but nunits.coeff[0] should be non-zero. > > What would it mean for the coeffs[0] to be 0? Would that mean the vector > length supports 0 bits? coeffs = {A,B} just means A+B*X, where X is the number of vector "chunks" beyond the minimum length. It's certainly valid for a poly_int to have a zero coeffs[0] (i.e. zero A). For example, (the length of a vector) - (the minimum length) would have this property. >> >> What is j and i when the divisor is zero? > > The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = {2, > 2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2 > remain = {coeffs = {0, 2}} OK, so let's use D to mean "data" and G to mean "gap". Then, for the minimum vector length of 2 elements, we have: DD GG DD GG The last load will read beyond the scalar loop if the vector loop happens to handle all elements of the scalar loop. For a vector length of 4 elements, we have: DDGG DDGG DDGG DDGG where every load contains both data and gaps. The same will be true for larger vectors. That's where remain={0,2} is coming from. The last load is fully redundant for the minimum VL but not for larger VL. Based on that, the patch below looks correct to me, but I might have misunderstood the intent. Thanks, Richard >>>>> gcc/ChangeLog: >>>>> * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent >>>>> divide-by-zero. >>>>> * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove >>>>> dg-ice. >>>>> --- >>>>> No changes in v3. Depends on the risc-v backend option added in patch 1 to >>>>> trigger the ICE. >>>>> --- >>>>> gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 - >>>>> gcc/tree-vect-stmts.cc | 3 ++- >>>>> 2 files changed, 2 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c >>>>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c >>>>> index dfbe09f01a1..79d03612a22 100644 >>>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c >>>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c >>>>> @@ -1,6 +1,5 @@ >>>>> /* { dg-do compile } */ >>>>> /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable >>>>> -O3 -mno-autovec-segment" } */ >>>>> -/* { dg-ice "Floating point exception" } */ >>>>> >>>>> enum e { c, d }; >>>>> enum g { f }; >>>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >>>>> index 4219ad832db..34f5736ba00 100644 >>>>> --- a/gcc/tree-vect-stmts.cc >>>>> +++ b/gcc/tree-vect-stmts.cc >>>>> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo, >>>>> - (vec_num * j + i) * nunits); >>>>> /* remain should now be > 0 and < nunits. >>>>> */ >>>>> unsigned num; >>>>> - if (constant_multiple_p (nunits, remain, >>>>> &num)) >>>>> + if (known_gt (remain, 0) >>>>> + && constant_multiple_p (nunits, remain, >>>>> &num)) >>>>> { >>>>> tree ptype; >>>>> new_vtype >>>>> -- >>>>> 2.43.2 >>>>> >>> Edwin >>>