On Mon, Mar 03, 2014 at 11:02:14AM +0800, lin zuojian wrote: > I wrote a test code like this: > void foo(int * a) > { > a[0] = 0xfafafafb; > a[1] = 0xfafafafc; > a[2] = 0xfafafafe; > a[3] = 0xfafafaff; > a[4] = 0xfafafaf0; > a[5] = 0xfafafaf1; > a[6] = 0xfafafaf2; > a[7] = 0xfafafaf3; > a[8] = 0xfafafaf4; > a[9] = 0xfafafaf5; > a[10] = 0xfafafaf6; > a[11] = 0xfafafaf7; > a[12] = 0xfafafaf8; > a[13] = 0xfafafaf9; > a[14] = 0xfafafafa; > a[15] = 0xfafaf0fa; > } > that was what gcc generated: > movl $-84215045, (%rdi) > movl $-84215044, 4(%rdi) > movl $-84215042, 8(%rdi) > movl $-84215041, 12(%rdi) > movl $-84215056, 16(%rdi) > ... > that was what LLVM/clang generated: > movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB > movq %rax, (%rdi) > movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE > movq %rax, 8(%rdi) > movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0 > movq %rax, 16(%rdi) > movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2 > ... > I ran the code on my i7 machine for 10000000000 times.Here was the result: > gcc: > real 0m50.613s > user 0m50.559s > sys 0m0.000s > > LLVM/clang: > real 0m32.036s > user 0m32.001s > sys 0m0.000s > > That mean movabsq did do a better job! > Should gcc peephole pass add such a combine?
This sounds like PR22141, but a microbenchmark isn't the right thing to decide this. From what I remember when playing with the patches, movabsq has been mostly bad for performance, at least on the CPUs I've tried it back then. In addition to whether movabsq + movq compared to two movl is more beneficial, also alignment plays role here, say if this is in an inner loop and not aligned to 64-bits whether it won't slow things down too much. Jakub