On Mon, Mar 03, 2014 at 11:02:14AM +0800, lin zuojian wrote:
>    I wrote a test code like this:
> void foo(int * a)
> {
>     a[0] = 0xfafafafb;
>     a[1] = 0xfafafafc;
>     a[2] = 0xfafafafe;
>     a[3] = 0xfafafaff;
>     a[4] = 0xfafafaf0;
>     a[5] = 0xfafafaf1;
>     a[6] = 0xfafafaf2;
>     a[7] = 0xfafafaf3;
>     a[8] = 0xfafafaf4;
>     a[9] = 0xfafafaf5;
>     a[10] = 0xfafafaf6;
>     a[11] = 0xfafafaf7;
>     a[12] = 0xfafafaf8;
>     a[13] = 0xfafafaf9;
>     a[14] = 0xfafafafa;
>     a[15] = 0xfafaf0fa;
> }
> that was what gcc generated:
>       movl    $-84215045, (%rdi)
>       movl    $-84215044, 4(%rdi)
>       movl    $-84215042, 8(%rdi)
>       movl    $-84215041, 12(%rdi)
>       movl    $-84215056, 16(%rdi)
>     ...
> that was what LLVM/clang generated:
>       movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB
>       movq    %rax, (%rdi)
>       movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE
>       movq    %rax, 8(%rdi)
>       movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0
>       movq    %rax, 16(%rdi)
>       movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2
>     ...
> I ran the code on my i7 machine for 10000000000 times.Here was the result:
> gcc:
> real  0m50.613s
> user  0m50.559s
> sys   0m0.000s
> 
> LLVM/clang:
> real  0m32.036s
> user  0m32.001s
> sys   0m0.000s
> 
> That mean movabsq did do a better job!
> Should gcc peephole pass add such a combine?

This sounds like PR22141, but a microbenchmark isn't the right thing
to decide this.  From what I remember when playing with the patches,
movabsq has been mostly bad for performance, at least on the CPUs I've tried
it back then.  In addition to whether movabsq + movq compared to two movl
is more beneficial, also alignment plays role here, say if this is in an
inner loop and not aligned to 64-bits whether it won't slow things down too
much.

        Jakub

Reply via email to