https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99633

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-03-18

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I guess a heuristic could be to use the available load/store bandwith (for
streaming loads/stores only) when load/store 'uops' (stmts/insns) dominate
the loop.  In the case of this loop we don't even need an epilogue so that's
a plus as well.

The inner loop could also be split at LEN_1D/2 to make the load of a[LEN_1D/2]
invariant in all but a single iteration (possibly not worth the trouble in this
case).

Reply via email to