[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

xuepeng dot guo at intel dot com Tue, 10 Feb 2009 23:37:22 -0800


------- Comment #22 from xuepeng dot guo at intel dot com  2009-02-11 07:37 
-------
(In reply to comment #18)
> Xuepeng, can you test with the loop as produced by my posted patch, that is:
> .L11:
>         movaps  (%rsi,%rax), %xmm0
>         addps   %xmm1, %xmm0
>         movaps  %xmm0, (%rdi,%rax)
>         addq    $16, %rax
>         cmpq    %rdx, %rax
>         jne     .L11
> I don't have access to new enough chips.


Your patch improved the performance. My machine is "Intel(R) Core(TM)2 Quad CPU
   Q6700  @ 2.66GHz". The results are:

[xg...@shgcc-9 38824]$ time ./gcc-42.out

real    0m1.991s
user    0m1.990s
sys     0m0.000s
[xg...@shgcc-9 38824]$ time ./gcc-42.out

real    0m1.991s
user    0m1.991s
sys     0m0.001s
[xg...@shgcc-9 38824]$ time ./gcc-42.out

real    0m1.991s
user    0m1.989s
sys     0m0.002s
[xg...@shgcc-9 38824]$ time ./gcc-44.out

real    0m1.880s
user    0m1.879s
sys     0m0.001s
[xg...@shgcc-9 38824]$ time ./gcc-44.out

real    0m1.878s
user    0m1.878s
sys     0m0.000s
[xg...@shgcc-9 38824]$ time ./gcc-44.out

real    0m1.870s
user    0m1.869s
sys     0m0.002s
[xg...@shgcc-9 38824]$ time ./gcc-44p.out

real    0m1.690s
user    0m1.690s
sys     0m0.000s
[xg...@shgcc-9 38824]$ time ./gcc-44p.out

real    0m1.690s
user    0m1.689s
sys     0m0.002s
[xg...@shgcc-9 38824]$ time ./gcc-44p.out

real    0m1.690s
user    0m1.690s
sys     0m0.000s

The only difference is:

--- 44.s        2009-02-11 15:34:57.000000000 +0800
+++ 44p.s       2009-02-11 15:34:49.000000000 +0800
@@ -102,8 +102,8 @@ _Z7bench_1PfS_fj:
        .p2align 4,,10
        .p2align 3
 .L11:
-       movaps  %xmm0, %xmm1
-       addps   (%rsi,%rax), %xmm1
+       movaps  (%rsi,%rax), %xmm1
+       addps   %xmm0, %xmm1
        movaps  %xmm1, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824

[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

Reply via email to