http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624
--- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-03-22 16:59:43 UTC --- (In reply to comment #9) > Do we need to optimize for partial register stall? xchg is enabled only for Pentium4, and this is not partial reg stall target. BTW: According to the docs, rol/ror on P4 has latency of 4 cycles + false flags dependency, where xchg has latency of 1.5 cycles.