On Thu, Aug 22, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote:
>
> Hi,
>
> Just wanted to ping this for more guidance.

It's difficult for me as long as I cannot investigate this with a testcase.  Can
we go ahead with the other parts so the testcase can be added and the
issue reproduced?

Richard.

> Edwin
>
> On 7/24/2024 12:03 PM, Edwin Lu wrote:
> >
> > On 7/24/2024 3:52 AM, Richard Biener wrote:
> >> On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu <e...@rivosinc.com> wrote:
> >>>
> >>> On 7/23/2024 11:20 AM, Richard Sandiford wrote:
> >>>> Edwin Lu <e...@rivosinc.com> writes:
> >>>>> On 7/23/2024 4:56 AM, Richard Biener wrote:
> >>>>>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote:
> >>>>>>> Hi Richard,
> >>>>>>>
> >>>>>>> On 5/31/2024 1:48 AM, Richard Biener wrote:
> >>>>>>>> On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill
> >>>>>>>> <patr...@rivosinc.com> wrote:
> >>>>>>>>> From: Greg McGary <g...@rivosinc.com>
> >>>>>>>> Still a NACK.  If remain ends up zero then
> >>>>>>>>
> >>>>>>>>                         /* Try to use a single smaller load when
> >>>>>>>> we are about
> >>>>>>>>                            to load excess elements compared to
> >>>>>>>> the unrolled
> >>>>>>>>                            scalar loop.  */
> >>>>>>>>                         if (known_gt ((vec_num * j + i + 1) *
> >>>>>>>> nunits,
> >>>>>>>>                                            (group_size * vf -
> >>>>>>>> gap)))
> >>>>>>>>                           {
> >>>>>>>>                             poly_uint64 remain = ((group_size *
> >>>>>>>> vf - gap)
> >>>>>>>>                                                   - (vec_num * j
> >>>>>>>> + i) * nunits);
> >>>>>>>>                             if (known_ge ((vec_num * j + i + 1)
> >>>>>>>> * nunits
> >>>>>>>>                                           - (group_size * vf -
> >>>>>>>> gap), nunits))
> >>>>>>>>                               /* DR will be unused.  */
> >>>>>>>>                               ltype = NULL_TREE;
> >>>>>>>>
> >>>>>>>> needs to be re-formulated so that the combined conditions make sure
> >>>>>>>> this doesn't happen.  The outer known_gt should already ensure that
> >>>>>>>> remain > 0.  For correctness that should possibly be maybe_gt
> >>>>>>>> though.
> >>>> Yeah.  FWIW, I mentioned the maybe_gt thing in
> >>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:
> >>>>
> >>>>     Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
> >>>>     We can only skip doing it if we know for sure that the load
> >>>> won't cross
> >>>>     the gap.  (Not sure whether the difference can trigger in
> >>>> practice.)
> >>>>
> >>>> But AFAICT, the known_gt doesn't inherently prove that remain is known
> >>>> to be nonzero.  It just proves that the gap between the end of the
> >>>> scalar
> >>>> accesses and the end of this vector is known to be nonzero.
> >>>>
> >>>> [switching round for easier reply]
> [removed some clarification questions about poly_int and vector
> representations]
> >>>>>> What is j and i when the divisor is zero?
> >>>>> The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs =
> >>>>> {2,
> >>>>> 2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2
> >>>>> remain = {coeffs = {0, 2}}
> >>>> OK, so let's use D to mean "data" and G to mean "gap".  Then, for the
> >>>> minimum vector length of 2 elements, we have:
> >>>>
> >>>>     DD GG DD GG
> >>>>
> >>>> The last load will read beyond the scalar loop if the vector loop
> >>>> happens
> >>>> to handle all elements of the scalar loop.
> >>>>
> >>>> For a vector length of 4 elements, we have:
> >>>>
> >>>>     DDGG DDGG DDGG DDGG
> >>>>
> >>>> where every load contains both data and gaps.  The same will be true
> >>>> for larger vectors.
> >>>>
> >>>> That's where remain={0,2} is coming from.  The last load is fully
> >>>> redundant
> >>>> for the minimum VL but not for larger VL.
> >>>> Based on that, the patch below looks correct to me, but I might have
> >>>> misunderstood the intent.
> >>> As an alternative to the original patch, would this also make sense?
> >>>
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index aab3aa59962..cd657ac63af 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -11479,7 +11479,7 @@ vectorizable_load (vec_info *vinfo,
> >>>                       /* Try to use a single smaller load when we are
> >>> about
> >>>                          to load excess elements compared to the
> >>> unrolled
> >>>                          scalar loop.  */
> >>> -                   if (known_gt ((vec_num * j + i + 1) * nunits,
> >>> +                   if (maybe_gt ((vec_num * j + i + 1) * nunits,
> >>>                                          (group_size * vf - gap)))
> >>>                         {
> >>>                           poly_uint64 remain = ((group_size * vf - gap)
> >> That's a good point - this should indeed be maybe_gt
> >>
> >>> @@ -11502,6 +11502,10 @@ vectorizable_load (vec_info *vinfo,
> >>>                             /* Aligned access to the gap area when
> >>> there's
> >>>                                at least one element in it is OK.  */
> >>>                             ;
> >>> +                       else if (maybe_eq (remain, 0))
> >>> +                         /* Handle remain.coeffs[0] == 0 case.
> >>> Number of
> >>> +                            elements is an exact multiple of vector
> >>> length.  */
> >>> +                         ;
> >> Hmm, but here it should be known_eq?  And why doesn't
> >> constant_multiple_p work then?  I think remain.coeffs[1] is never
> >> negative here, but that would be another way to arrive at zero.
> >
> > If I use known_eq, the condition would be skipped and we'll enter the
> > final else statement which triggers the floating point exception in
> > constant_multiple_p.
> >
> > I don't know what it would mean for remain.coeffs[1] to be negative.
> > To me it doesn't really make sense given my understanding that
> > it represents the number of additional chunks beyond the minimum vector
> > size. I think it would make sense to assert remain.coeffs[1] >= 0?
> >
> > I believe it would need to be maybe_eq since if remain.coeffs[1] is
> > non-negative, the only way for remain == 0 is if X == 0 and
> > remain.coeffs[0] == 0.
> >
> >> So in principle this would amend
> >>
> >>                          if (known_ge ((vec_num * j + i + 1) * nunits
> >>                                        - (group_size * vf - gap),
> >> nunits))
> >>                            /* DR will be unused.  */
> >>                            ltype = NULL_TREE;
> >>
> >> ?
> >
> > If we know remain == 0, then yes, I think it does amend that statement.
> > However, since we don't know what X is, we aren't guaranteed that the load
> > will eventually not be used.
> >
> >> Sure the original patch would work, but then the corresponding change
> >> in get_group_loadstore_type needs to be done to ensure if remain
> >> maybe zero we're forcing peeling for gaps, no?
> >
> > If remain may be zero, then wouldn't all the data already be loaded into
> > vectors during the vector loop? Would we still need to peel for gaps
> > in the scalar loop if all the data is processed?
> >
> > I'm not sure if my understanding is correct so please forgive me if
> > this doesn't make sense.
> >
> >>>                           else
> >>>                             {
> >>>                               /* remain should now be > 0 and <
> >>> nunits.  */
> >>>
> >>>> Thanks,
> >>>> Richard
> >
> > Edwin
> >
> >>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>             * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc):
> >>>>>>>>> Prevent divide-by-zero.
> >>>>>>>>>             *
> >>>>>>>>> testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove
> >>>>>>>>> dg-ice.
> >>>>>>>>> ---
> >>>>>>>>> No changes in v3. Depends on the risc-v backend option added in
> >>>>>>>>> patch 1 to
> >>>>>>>>> trigger the ICE.
> >>>>>>>>> ---
> >>>>>>>>>      gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
> >>>>>>>>>      gcc/tree-vect-stmts.cc                                  |
> >>>>>>>>> 3 ++-
> >>>>>>>>>      2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git
> >>>>>>>>> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>> index dfbe09f01a1..79d03612a22 100644
> >>>>>>>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>> @@ -1,6 +1,5 @@
> >>>>>>>>>      /* { dg-do compile } */
> >>>>>>>>>      /* { dg-options "-march=rv64gcv -mabi=lp64d
> >>>>>>>>> -mrvv-vector-bits=scalable -O3 -mno-autovec-segment" } */
> >>>>>>>>> -/* { dg-ice "Floating point exception" } */
> >>>>>>>>>
> >>>>>>>>>      enum e { c, d };
> >>>>>>>>>      enum g { f };
> >>>>>>>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>>>>>>>> index 4219ad832db..34f5736ba00 100644
> >>>>>>>>> --- a/gcc/tree-vect-stmts.cc
> >>>>>>>>> +++ b/gcc/tree-vect-stmts.cc
> >>>>>>>>> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
> >>>>>>>>>                                      - (vec_num * j + i) *
> >>>>>>>>> nunits);
> >>>>>>>>>                                 /* remain should now be > 0 and
> >>>>>>>>> < nunits.  */
> >>>>>>>>>                                 unsigned num;
> >>>>>>>>> -                           if (constant_multiple_p (nunits,
> >>>>>>>>> remain, &num))
> >>>>>>>>> +                           if (known_gt (remain, 0)
> >>>>>>>>> +                               && constant_multiple_p (nunits,
> >>>>>>>>> remain, &num))
> >>>>>>>>>                                   {
> >>>>>>>>>                                     tree ptype;
> >>>>>>>>>                                     new_vtype
> >

Reply via email to