------- Comment #2 from siarhei dot siamashka at gmail dot com  2010-08-15 
01:01 -------
Here is another test example, now with some performance numbers for gcc 4.5.1
on 64-bit Intel Atom:

$ cat fibbonachi.c
/*******************/
#include <stdlib.h>

int fib(int n)
{
    int sum, previous = -1, result = 1;

    n++;
    while (--n >= 0)
    {
        sum = result + previous;
        previous = result;
        result = sum;
    }

    return result;
}

int main(void)
{
    if (fib(1000000000) != 1532868155)
        abort();
    return 0;
}
/*******************/

$ gcc -O2 -march=atom -o fibbonachi-O2 fibbonachi.c
$ gcc -Os -march=atom -o fibbonachi-Os fibbonachi.c

$ time ./fibbonachi-O2

real    0m3.722s
user    0m3.652s
sys     0m0.000s

$ time ./fibbonachi-Os

real    0m3.078s
user    0m3.044s
sys     0m0.000s


Loop code for -O2 optimizations on x86-64:

  18:   89 d1                   mov    %edx,%ecx
  1a:   89 c2                   mov    %eax,%edx
  1c:   8d 7f ff                lea    -0x1(%rdi),%edi
  1f:   8d 04 0a                lea    (%rdx,%rcx,1),%eax
  22:   83 ff ff                cmp    $0xffffffffffffffff,%edi
  25:   75 f1                   jne    18 <fib+0x18>

Loop code for -Os optimizations on x86-64:

   c:   8d 0c 10                lea    (%rax,%rdx,1),%ecx
   f:   89 c2                   mov    %eax,%edx
  11:   89 c8                   mov    %ecx,%eax
  13:   ff cf                   dec    %edi
  15:   79 f5                   jns    c <fib+0xc>



Also on ARM, loop code is suboptimal in all cases (just "subs + bge" could be
used without any need for "cmn/cmp"):

-O2 on ARM:
  10:   e2433001        sub     r3, r3, #1
  14:   e0820001        add     r0, r2, r1
  18:   e3730001        cmn     r3, #1
  1c:   e1a01002        mov     r1, r2
  20:   e1a02000        mov     r2, r0
  24:   1afffff9        bne     10 <fib+0x10>

-Os on ARM:
   c:   e0831002        add     r1, r3, r2
  10:   e2400001        sub     r0, r0, #1
  14:   e1a02003        mov     r2, r3
  18:   e1a03001        mov     r3, r1
  1c:   e3500000        cmp     r0, #0
  20:   aafffff9        bge     c <fib+0xc>

-Os -mthumb on ARM:
   8:   1899            adds    r1, r3, r2
   a:   3801            subs    r0, #1
   c:   461a            mov     r2, r3
   e:   460b            mov     r3, r1
  10:   2800            cmp     r0, #0
  12:   daf9            bge.n   8 <fib+0x8>


There are still similarities between x86 and ARM here. When using -O2
optimizations, the redundant comparison is performed with -1 constant in both
cases.


-- 

siarhei dot siamashka at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37734

Reply via email to