https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117733

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |riscv
          Component|middle-end                  |tree-optimization
             Blocks|                            |26163, 53947

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The inner loop is unrolled and we select a [2,2] VF as the group size is 5:

t.f90:12:20: note:   Detected interleaving load of size 5
t.f90:12:20: note:      _31 = (*q_18(D))[_30];
t.f90:12:20: note:      _44 = (*q_18(D))[_43];
t.f90:12:20: note:      _57 = (*q_18(D))[_56];
t.f90:12:20: note:      _70 = (*q_18(D))[_69];
t.f90:12:20: note:      _83 = (*q_18(D))[_82];

I think what's needed for your idea to work is basically re-rolling the loop,
I don't see how we can otherwise deal with this absent a vector mode
with [10,2]?  Note the re-rolling can take place "virtually" inside the
vectorizer, we'd use a fractional VF to get us to group size 1.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to