http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182
--- Comment #10 from Oleg Smolsky <oleg.smolsky at gmail dot com> 2011-08-25 22:08:49 UTC --- BTW, the uint16_t test also got slower for the same very reason. Here is the inner-most loop generated by g++4.6: text:0000000000400DA0 loc_400DA0: .text:0000000000400DA0 add eax, 0Ah .text:0000000000400DA3 add ax, [rdx] .text:0000000000400DA6 add rdx, 2 .text:0000000000400DAA cmp rdx, 5092E0h .text:0000000000400DB1 jnz short loc_400DA0