https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 4 Sep 2014, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org
> 
> --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Aren't these optimizations actually a pessimization for -mmovbe if the inner
> bswap is on a read from memory?  Assuming the load and bswap instruction is
> cheap, then e.g. loading two values with bswap on them and doing say xor on
> them afterwards might be cheaper than load the two values, xor them and then
> bswap them (because for that bswap you don't have a load+bswap instruction).

Depends on how fast that load+bswap instruction is I suppose (if it
plays nicely with things like store-forwarding on the pipeline
and pipelines as well as regular loads, etc.).

That said - what does the optimization guides say on consecutive
movbe instructions vs. non-movbe and a bswap instruction?

Reply via email to