http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53315
--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-05-12 09:14:53 UTC --- Created attachment 27385 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27385 gcc48-pr53315.patch That is because the patch is buggy. Fixed thusly, though haven't tested it on Haswell (obviously) nor sim. Note, it would be nice to have a peephole or something similar (guess peepholes won't do anything across multiple bbs, perhaps machine reorg) to optimize that movl $-1, %eax xbegin .L2 .L2: cmpl $-1, %eax jne .L3 xorl %eax, %eax into say movl $-1, %eax xbegin .L3 xorl %eax, %eax or even xbegin .L3 xorl %eax, %eax