https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117733
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |riscv Component|middle-end |tree-optimization Blocks| |26163, 53947 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The inner loop is unrolled and we select a [2,2] VF as the group size is 5: t.f90:12:20: note: Detected interleaving load of size 5 t.f90:12:20: note: _31 = (*q_18(D))[_30]; t.f90:12:20: note: _44 = (*q_18(D))[_43]; t.f90:12:20: note: _57 = (*q_18(D))[_56]; t.f90:12:20: note: _70 = (*q_18(D))[_69]; t.f90:12:20: note: _83 = (*q_18(D))[_82]; I think what's needed for your idea to work is basically re-rolling the loop, I don't see how we can otherwise deal with this absent a vector mode with [10,2]? Note the re-rolling can take place "virtually" inside the vectorizer, we'd use a fractional VF to get us to group size 1. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations