https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117558
Bug ID: 117558 Summary: peeling for gap overrun check imprecise for VLA Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- RISC-V FAILs: FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c scan-assembler-times vlseg4e64\\.v 1 FAIL: gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c scan-assembler-times vlseg4e64\\.v 1 The relevant check is /* Peeling for gaps assumes that a single scalar iteration is enough to make sure the last vector iteration doesn't access excess elements. */ if (overrun_p && (!can_div_trunc_p (group_size * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap, nunits, &tem, &remain) || maybe_lt (remain + group_size, nunits))) { /* But peeling a single scalar iteration is enough if we can use the next power-of-two sized partial access and that is sufficiently small to be covered by the single scalar iteration. */ unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size; if (!nunits.is_constant (&cnunits) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf) || (((cremain = group_size * cvf - gap % cnunits), true) && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits) && (cremain + group_size < cpart_size || vector_vector_composition_type (vectype, cnunits / cpart_size, &half_vtype) == NULL_TREE))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "peeling for gaps insufficient for " "access\n"); return false; But with RVVM1DF we have group_size == 4, gap == 3, VF [2, 2] and nunits [2, 2] which yields a failure to can_div_trunc_p of [5, 8] by [2, 2]. For RVVM1SF and VF [4, 4] (same group/gap) and nunits [4, 4] can_div_trunc_p of [13, 16] by [4, 4] succeeds. I'll note the non-SLP path lacks the above correctness check. So the thing we're missing here is that when nunits < group_size the maybe_lt (remain + group_size, nunits) check is never true. Of course maybe_gt (nunits, group_size), but with say, a VF of two the division would succeed. I wonder how to improve the check for this case.