https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118276
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target|X86_64 |x86_64-*-* CC| |hubicka at gcc dot gnu.org --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- So this is probably a tuning issue in the backend then, thinking (for generic tuning) that for 11 elements req stosq is better (size/speed) vs. the unrolled SSE code. What's faster will ultimatively depend on the uarch (some have a low overhead rep stosq, some do not).