------- Comment #8 from michaelni at gmx dot at 2006-02-11 11:40 ------- I really think this should be fixed, otherwise gcc wont be able to follow its exponential decaying performance which it has so accurately followed since 2.95 at least, to show clearer how much speed we could loose by fixing this i was nice and benchmarked the code (a simple for loop running 100 times with the code inside, rdtsc based timing outside with a 1000 times executed loop surounding it benchmarink was done on a 800mhz duron and a 500mhz pentium3, the first number is the number of cpu cycles for the duron, second one for p3
first let me show you the optimal code by steven boscher? "addl $1,a\n" " je .L1\n" "addl $1,a\n" ".L1:\n" 11.557 / 12.514 now what gcc 3.4/3.2 generated: "movl a, %%eax\n" "incl %%eax\n" "testl %%eax, %%eax\n" "movl %%eax, a\n" "je .L1\n" "incl %%eax\n" "movl %%eax, a\n" ".L1:\n" //6.220 / 6.159 the code generated by mainline had 2 ret so it didnt fit in my benchmark loop the even better code by segher AT d12relay01 DOT megacenter.de.ibm.com "addl $1,a\n" "sbbl $-1,a\n" //11.755 / 15.111 one case which you must be carefull not to generate as its almost twice as fast as the on above while still being just 2 instructions is: "cmpl $-1,a\n" "adcl $1,a\n" //7.827 / 7.422 another 2 slightly faster variants are: "movl a, %%eax\n" "cmpl $-1,%%eax\n" "adcl $1,%%eax\n" "movl %%eax,a\n" //6.567 / 8.811 "movl a, %%eax\n" "addl $1,%%eax\n" "sbbl $-1,%%eax\n" "movl %%eax,a\n" //6.564 / 8.813 what a 14year old script kid would write and what gcc would generate if it where local variables: "movl a, %%eax\n" "incl %%eax\n" "je .L1\n" "incl %%eax\n" ".L1:\n" "movl %%eax, a\n" //6.162 / 5.426 what i would write (as the variable isnt used in my testcase): "\n" //2.155 / 2.410 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395