http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200



H.J. Lu <hjl.tools at gmail dot com> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |areg.melikadamyan at gmail

                   |                            |dot com



--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2013-02-05 23:50:35 
UTC ---

Optimized alignments are enabled for -O2 and above.  For -O2, there are:



        .p2align 4,,10

        .p2align 3

.L19:

        cmpl    file(,%rbx,4), %ebp

        jg      .L18

        cmpl    0(%r13,%rbx,4), %ebp

        jg      .L18

        cmpl    (%r12), %ebp

        jle     .L22

        .p2align 4,,10

        .p2align 3

.L18:



and generate



  400ab6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)

  400ac0:       3b 2c 9d a0 1a 60 00    cmp    0x601aa0(,%rbx,4),%ebp

  400ac7:       7f 17                   jg     400ae0 <find+0x70>

  400ac9:       41 3b 6c 9d 00          cmp    0x0(%r13,%rbx,4),%ebp

  400ace:       7f 10                   jg     400ae0 <find+0x70>

  400ad0:       41 3b 2c 24             cmp    (%r12),%ebp

  400ad4:       7e 32                   jle    400b08 <find+0x98>

  400ad6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)



Branch Predict Unit fetches 32-byte at a time.  There are 3 back-to-back

fused cmp/jcc instructions in 32-byte window, which causes misprediction.

We can add a nop after the first cmp/jcc to avoid back-to-back cmp/jccs.

Reply via email to