https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81496
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-*, i?86-*-* --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- We should definitely aim at comment#2, everything else will be notoriously slow because of STLF not working. So the "win" to use a 256bit move will be marginal at best (code size). As of first building xmms and then merging them, ICC always uses a series of inserts into the final ymm. Bulldozer/Zen might benefit from the xmm variant though.