https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559
--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 4 Sep 2014, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Aren't these optimizations actually a pessimization for -mmovbe if the inner > bswap is on a read from memory? Assuming the load and bswap instruction is > cheap, then e.g. loading two values with bswap on them and doing say xor on > them afterwards might be cheaper than load the two values, xor them and then > bswap them (because for that bswap you don't have a load+bswap instruction). Depends on how fast that load+bswap instruction is I suppose (if it plays nicely with things like store-forwarding on the pipeline and pipelines as well as regular loads, etc.). That said - what does the optimization guides say on consecutive movbe instructions vs. non-movbe and a bswap instruction?