http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200



--- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> 2013-02-04 
21:36:38 UTC ---

(In reply to comment #1)

> What happens if you also use -fno-ivopts ?



For me, -fno-ivopts gives a small improvement, but still slower than -O0.  I

think the slowdown is related to code layout in the Icache and branch

predictors. There is a hot region which is composed of three consecutive

conditional branches (cmp-jg-cmp-jg-cmp-jg in optimized code and

mov-cmp-jl-mov-cmp-jl-mov-cmp-jl at -O0). If I align the first _and_ the second

to a 16-byte boundary, I get better performance then -O0, but aligning only one

of those is still slower than -O0:



--- o1.s    2013-02-05 00:04:44.405072150 +0400

+++ o1h.s    2013-02-05 01:17:43.648014420 +0400

@@ -119,9 +119,11 @@ find:

     movq    %rdx, %rbp

     leal    1(%r14), %eax

     movl    %eax, 12(%rsp)

+    .p2align 4,,7

 .L18:

     cmpl    file(%r12), %r14d

     jg    .L17

+    .p2align 4,,7

     cmpl    (%r15,%r12), %r14d

     jg    .L17

     cmpl    (%rbx), %r14d

Reply via email to