https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117558

            Bug ID: 117558
           Summary: peeling for gap overrun check imprecise for VLA
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

RISC-V FAILs:

FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c scan-assembler-times
vlseg4e64\\.v 1
FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c scan-assembler-times
vlseg4e64\\.v 1

The relevant check is

          /* Peeling for gaps assumes that a single scalar iteration
             is enough to make sure the last vector iteration doesn't
             access excess elements.  */
          if (overrun_p
              && (!can_div_trunc_p (group_size
                                    * LOOP_VINFO_VECT_FACTOR (loop_vinfo) -
gap,
                                    nunits, &tem, &remain)
                  || maybe_lt (remain + group_size, nunits)))
            {
              /* But peeling a single scalar iteration is enough if
                 we can use the next power-of-two sized partial
                 access and that is sufficiently small to be covered
                 by the single scalar iteration.  */
              unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
              if (!nunits.is_constant (&cnunits)
                  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
                  || (((cremain = group_size * cvf - gap % cnunits), true)
                      && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits)
                      && (cremain + group_size < cpart_size
                          || vector_vector_composition_type
                               (vectype, cnunits / cpart_size,
                                &half_vtype) == NULL_TREE)))
                {
                  if (dump_enabled_p ())
                    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                     "peeling for gaps insufficient for "
                                     "access\n");
                  return false;

But with RVVM1DF we have group_size == 4, gap == 3, VF [2, 2] and nunits [2, 2]
which yields a failure to can_div_trunc_p of [5, 8] by [2, 2].

For RVVM1SF and VF [4, 4] (same group/gap) and nunits [4, 4] can_div_trunc_p
of [13, 16] by [4, 4] succeeds.

I'll note the non-SLP path lacks the above correctness check.

So the thing we're missing here is that when nunits < group_size the
maybe_lt (remain + group_size, nunits) check is never true.  Of course
maybe_gt (nunits, group_size), but with say, a VF of two the division
would succeed.

I wonder how to improve the check for this case.

Reply via email to