https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Aren't these optimizations actually a pessimization for -mmovbe if the inner bswap is on a read from memory? Assuming the load and bswap instruction is cheap, then e.g. loading two values with bswap on them and doing say xor on them afterwards might be cheaper than load the two values, xor them and then bswap them (because for that bswap you don't have a load+bswap instruction).