On Wed, Aug 28, 2024 at 12:43 AM Edwin Lu <e...@rivosinc.com> wrote:
>
> On 8/22/2024 5:35 AM, Richard Biener wrote:
> > On Thu, Aug 22, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote:
> >>
> >> Hi,
> >>
> >> Just wanted to ping this for more guidance.
> >
> > It's difficult for me as long as I cannot investigate this with a testcase. 
> >  Can
> > we go ahead with the other parts so the testcase can be added and the
> > issue reproduced?
> >
> > Richard.
>
> The testcase can be found in patch 1/2 of the series
> https://inbox.sourceware.org/gcc-patches/20240530000512.2625173-2-patr...@rivosinc.com/
> with the newly added gcc.target/riscv/rvv/autovec/no-segment.c file.

I did find the test but the issue doesn't reproduce for me with it.

> +cc Jeff, Palmer
>
>  From what I understand reading over the threads on the previous patch
> series versions, review for patch 1/2 has been deferred until this bug
> fix has been acked so we can't move forward with the other patch yet.

I'd say continue with review and checkin, leave the test in FAILing state
and then followup with the fix as we're stuck here otherwise.

Richard.

> Edwin
>
> >> On 7/24/2024 12:03 PM, Edwin Lu wrote:
> >>>
> >>> On 7/24/2024 3:52 AM, Richard Biener wrote:
> >>>> On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu <e...@rivosinc.com> wrote:
> >>>>>
> >>>>> On 7/23/2024 11:20 AM, Richard Sandiford wrote:
> >>>>>> Edwin Lu <e...@rivosinc.com> writes:
> >>>>>>> On 7/23/2024 4:56 AM, Richard Biener wrote:
> >>>>>>>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu <e...@rivosinc.com> wrote:
> >>>>>>>>> Hi Richard,
> >>>>>>>>>
> >>>>>>>>> On 5/31/2024 1:48 AM, Richard Biener wrote:
> >>>>>>>>>> On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill
> >>>>>>>>>> <patr...@rivosinc.com> wrote:
> >>>>>>>>>>> From: Greg McGary <g...@rivosinc.com>
> >>>>>>>>>> Still a NACK.  If remain ends up zero then
> >>>>>>>>>>
> >>>>>>>>>>                          /* Try to use a single smaller load when
> >>>>>>>>>> we are about
> >>>>>>>>>>                             to load excess elements compared to
> >>>>>>>>>> the unrolled
> >>>>>>>>>>                             scalar loop.  */
> >>>>>>>>>>                          if (known_gt ((vec_num * j + i + 1) *
> >>>>>>>>>> nunits,
> >>>>>>>>>>                                             (group_size * vf -
> >>>>>>>>>> gap)))
> >>>>>>>>>>                            {
> >>>>>>>>>>                              poly_uint64 remain = ((group_size *
> >>>>>>>>>> vf - gap)
> >>>>>>>>>>                                                    - (vec_num * j
> >>>>>>>>>> + i) * nunits);
> >>>>>>>>>>                              if (known_ge ((vec_num * j + i + 1)
> >>>>>>>>>> * nunits
> >>>>>>>>>>                                            - (group_size * vf -
> >>>>>>>>>> gap), nunits))
> >>>>>>>>>>                                /* DR will be unused.  */
> >>>>>>>>>>                                ltype = NULL_TREE;
> >>>>>>>>>>
> >>>>>>>>>> needs to be re-formulated so that the combined conditions make sure
> >>>>>>>>>> this doesn't happen.  The outer known_gt should already ensure that
> >>>>>>>>>> remain > 0.  For correctness that should possibly be maybe_gt
> >>>>>>>>>> though.
> >>>>>> Yeah.  FWIW, I mentioned the maybe_gt thing in
> >>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:
> >>>>>>
> >>>>>>      Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
> >>>>>>      We can only skip doing it if we know for sure that the load
> >>>>>> won't cross
> >>>>>>      the gap.  (Not sure whether the difference can trigger in
> >>>>>> practice.)
> >>>>>>
> >>>>>> But AFAICT, the known_gt doesn't inherently prove that remain is known
> >>>>>> to be nonzero.  It just proves that the gap between the end of the
> >>>>>> scalar
> >>>>>> accesses and the end of this vector is known to be nonzero.
> >>>>>>
> >>>>>> [switching round for easier reply]
> >> [removed some clarification questions about poly_int and vector
> >> representations]
> >>>>>>>> What is j and i when the divisor is zero?
> >>>>>>> The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs =
> >>>>>>> {2,
> >>>>>>> 2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2
> >>>>>>> remain = {coeffs = {0, 2}}
> >>>>>> OK, so let's use D to mean "data" and G to mean "gap".  Then, for the
> >>>>>> minimum vector length of 2 elements, we have:
> >>>>>>
> >>>>>>      DD GG DD GG
> >>>>>>
> >>>>>> The last load will read beyond the scalar loop if the vector loop
> >>>>>> happens
> >>>>>> to handle all elements of the scalar loop.
> >>>>>>
> >>>>>> For a vector length of 4 elements, we have:
> >>>>>>
> >>>>>>      DDGG DDGG DDGG DDGG
> >>>>>>
> >>>>>> where every load contains both data and gaps.  The same will be true
> >>>>>> for larger vectors.
> >>>>>>
> >>>>>> That's where remain={0,2} is coming from.  The last load is fully
> >>>>>> redundant
> >>>>>> for the minimum VL but not for larger VL.
> >>>>>> Based on that, the patch below looks correct to me, but I might have
> >>>>>> misunderstood the intent.
> >>>>> As an alternative to the original patch, would this also make sense?
> >>>>>
> >>>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>>>> index aab3aa59962..cd657ac63af 100644
> >>>>> --- a/gcc/tree-vect-stmts.cc
> >>>>> +++ b/gcc/tree-vect-stmts.cc
> >>>>> @@ -11479,7 +11479,7 @@ vectorizable_load (vec_info *vinfo,
> >>>>>                        /* Try to use a single smaller load when we are
> >>>>> about
> >>>>>                           to load excess elements compared to the
> >>>>> unrolled
> >>>>>                           scalar loop.  */
> >>>>> -                   if (known_gt ((vec_num * j + i + 1) * nunits,
> >>>>> +                   if (maybe_gt ((vec_num * j + i + 1) * nunits,
> >>>>>                                           (group_size * vf - gap)))
> >>>>>                          {
> >>>>>                            poly_uint64 remain = ((group_size * vf - gap)
> >>>> That's a good point - this should indeed be maybe_gt
> >>>>
> >>>>> @@ -11502,6 +11502,10 @@ vectorizable_load (vec_info *vinfo,
> >>>>>                              /* Aligned access to the gap area when
> >>>>> there's
> >>>>>                                 at least one element in it is OK.  */
> >>>>>                              ;
> >>>>> +                       else if (maybe_eq (remain, 0))
> >>>>> +                         /* Handle remain.coeffs[0] == 0 case.
> >>>>> Number of
> >>>>> +                            elements is an exact multiple of vector
> >>>>> length.  */
> >>>>> +                         ;
> >>>> Hmm, but here it should be known_eq?  And why doesn't
> >>>> constant_multiple_p work then?  I think remain.coeffs[1] is never
> >>>> negative here, but that would be another way to arrive at zero.
> >>>
> >>> If I use known_eq, the condition would be skipped and we'll enter the
> >>> final else statement which triggers the floating point exception in
> >>> constant_multiple_p.
> >>>
> >>> I don't know what it would mean for remain.coeffs[1] to be negative.
> >>> To me it doesn't really make sense given my understanding that
> >>> it represents the number of additional chunks beyond the minimum vector
> >>> size. I think it would make sense to assert remain.coeffs[1] >= 0?
> >>>
> >>> I believe it would need to be maybe_eq since if remain.coeffs[1] is
> >>> non-negative, the only way for remain == 0 is if X == 0 and
> >>> remain.coeffs[0] == 0.
> >>>
> >>>> So in principle this would amend
> >>>>
> >>>>                           if (known_ge ((vec_num * j + i + 1) * nunits
> >>>>                                         - (group_size * vf - gap),
> >>>> nunits))
> >>>>                             /* DR will be unused.  */
> >>>>                             ltype = NULL_TREE;
> >>>>
> >>>> ?
> >>>
> >>> If we know remain == 0, then yes, I think it does amend that statement.
> >>> However, since we don't know what X is, we aren't guaranteed that the load
> >>> will eventually not be used.
> >>>
> >>>> Sure the original patch would work, but then the corresponding change
> >>>> in get_group_loadstore_type needs to be done to ensure if remain
> >>>> maybe zero we're forcing peeling for gaps, no?
> >>>
> >>> If remain may be zero, then wouldn't all the data already be loaded into
> >>> vectors during the vector loop? Would we still need to peel for gaps
> >>> in the scalar loop if all the data is processed?
> >>>
> >>> I'm not sure if my understanding is correct so please forgive me if
> >>> this doesn't make sense.
> >>>
> >>>>>                            else
> >>>>>                              {
> >>>>>                                /* remain should now be > 0 and <
> >>>>> nunits.  */
> >>>>>
> >>>>>> Thanks,
> >>>>>> Richard
> >>>
> >>> Edwin
> >>>
> >>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>              * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc):
> >>>>>>>>>>> Prevent divide-by-zero.
> >>>>>>>>>>>              *
> >>>>>>>>>>> testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove
> >>>>>>>>>>> dg-ice.
> >>>>>>>>>>> ---
> >>>>>>>>>>> No changes in v3. Depends on the risc-v backend option added in
> >>>>>>>>>>> patch 1 to
> >>>>>>>>>>> trigger the ICE.
> >>>>>>>>>>> ---
> >>>>>>>>>>>       gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 
> >>>>>>>>>>> -
> >>>>>>>>>>>       gcc/tree-vect-stmts.cc                                  |
> >>>>>>>>>>> 3 ++-
> >>>>>>>>>>>       2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git
> >>>>>>>>>>> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>>>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>>>> index dfbe09f01a1..79d03612a22 100644
> >>>>>>>>>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >>>>>>>>>>> @@ -1,6 +1,5 @@
> >>>>>>>>>>>       /* { dg-do compile } */
> >>>>>>>>>>>       /* { dg-options "-march=rv64gcv -mabi=lp64d
> >>>>>>>>>>> -mrvv-vector-bits=scalable -O3 -mno-autovec-segment" } */
> >>>>>>>>>>> -/* { dg-ice "Floating point exception" } */
> >>>>>>>>>>>
> >>>>>>>>>>>       enum e { c, d };
> >>>>>>>>>>>       enum g { f };
> >>>>>>>>>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>>>>>>>>>> index 4219ad832db..34f5736ba00 100644
> >>>>>>>>>>> --- a/gcc/tree-vect-stmts.cc
> >>>>>>>>>>> +++ b/gcc/tree-vect-stmts.cc
> >>>>>>>>>>> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
> >>>>>>>>>>>                                       - (vec_num * j + i) *
> >>>>>>>>>>> nunits);
> >>>>>>>>>>>                                  /* remain should now be > 0 and
> >>>>>>>>>>> < nunits.  */
> >>>>>>>>>>>                                  unsigned num;
> >>>>>>>>>>> -                           if (constant_multiple_p (nunits,
> >>>>>>>>>>> remain, &num))
> >>>>>>>>>>> +                           if (known_gt (remain, 0)
> >>>>>>>>>>> +                               && constant_multiple_p (nunits,
> >>>>>>>>>>> remain, &num))
> >>>>>>>>>>>                                    {
> >>>>>>>>>>>                                      tree ptype;
> >>>>>>>>>>>                                      new_vtype
> >>>
> >
>

Reply via email to