https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118276

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|X86_64                      |x86_64-*-*
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is probably a tuning issue in the backend then, thinking (for generic
tuning) that for 11 elements req stosq is better (size/speed) vs. the
unrolled SSE code.

What's faster will ultimatively depend on the uarch (some have a low
overhead rep stosq, some do not).

Reply via email to