http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200



--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-06 
09:57:13 UTC ---

(In reply to comment #5)

> Optimized alignments are enabled for -O2 and above.  For -O2, there are:

> 

>         .p2align 4,,10

>         .p2align 3

> .L19:

>         cmpl    file(,%rbx,4), %ebp

>         jg      .L18

>         cmpl    0(%r13,%rbx,4), %ebp

>         jg      .L18

>         cmpl    (%r12), %ebp

>         jle     .L22

>         .p2align 4,,10

>         .p2align 3

> .L18:

> 

> and generate

> 

>   400ab6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)

>   400ac0:       3b 2c 9d a0 1a 60 00    cmp    0x601aa0(,%rbx,4),%ebp

>   400ac7:       7f 17                   jg     400ae0 <find+0x70>

>   400ac9:       41 3b 6c 9d 00          cmp    0x0(%r13,%rbx,4),%ebp

>   400ace:       7f 10                   jg     400ae0 <find+0x70>

>   400ad0:       41 3b 2c 24             cmp    (%r12),%ebp

>   400ad4:       7e 32                   jle    400b08 <find+0x98>

>   400ad6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)

> 

> Branch Predict Unit fetches 32-byte at a time.  There are 3 back-to-back

> fused cmp/jcc instructions in 32-byte window, which causes misprediction.

> We can add a nop after the first cmp/jcc to avoid back-to-back cmp/jccs.



Yeah, I suppose if we bother with alignment we should do that.  Can we

do it with a peephole to only do it between two consecutive cmp/jccs?

Reply via email to