http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 
08:28:34 UTC ---
For me, 4.1 is equally fast to 4.6 on my CPU and on the reduced testcase I've
attached (not clear if it models what the original benchmark did right or not),
and on the trunk regressed with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176072
Before that the inner loop looked like:
.L12:
        addl    $10, %edx
        addb    0(%rbp,%rcx), %dl
        addq    $1, %rcx
        cmpl    %ecx, %ebx
        jg      .L12
and now it looks like:
.L12:
        movzbl  0(%rbp,%rdx), %r8d
        addq    $1, %rdx
        cmpl    %edx, %ebx
        leal    10(%rcx,%r8), %ecx
        jg      .L12

Reply via email to