https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70308
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #4) > The problem is actually in expr.c expanders, in this particular case > set_storage_via_setmem expander. These expanders loop from narrowest mode to > wider modes: > > for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode; > mode = GET_MODE_WIDER_MODE (mode)) > { > enum insn_code code = direct_optab_handler (setmem_optab, mode); > > if (code != CODE_FOR_nothing > ... > > x86_64 declares "movmem<mode>" expander that defines both, movmemdi *AND* > movmemsi on x86_64. Since the above loop expands narrowest mode first, it > first checks movmemsi (which x86_64 uses as well). > > Basing on the promise that wider-mode expanders are inherently faster, this > is middle-end deficiency. Middle-end should reverse mode scanning loop to > check wider modes first. > > Adding some CCs. Hmm, indeed. Of course there is no way to iterate from largest to narrowest mode... Also one can only hope strict-align targets don't support expanding via un-aligned larger modes then (though it will only fail "late" during maybe_expand_insn then). How is the changed behavior for x86 based on profile to explain btw? I also wonder why the backend limits itself to emit SImode stores, it can simply take the mode setmem is expanded with as a hint, no?