http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294


Siarhei Siamashka <siarhei.siamashka at gmail dot com> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |siarhei.siamashka at gmail

                   |                            |dot com



--- Comment #9 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 
2012-12-20 04:45:10 UTC ---

(In reply to comment #3)

> Actually this case should not be using post modify at all except how many bits

> does ARM have to use for an offset? I thought 16bits which means you don't 
> need

> that at all and GCC should generate it without an increment.  Oh and this is a

> RTL opt issue.



Seems like gcc 4.7.2 and 4.8.0 20121219 (experimental) are already doing this,

which hides the postincrement issue for the currently attached testcase.



However postincrement is still a performance problem for ARM. The code I'm

having troubles with is the following:



/*******************************************/



typedef unsigned long long T;



void fill(T *buf, int n, T v)

{

    while ((n -= 16) >= 0)

    {

        *buf++ = v;

        *buf++ = v;

    }

}



/*******************************************/



$ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c

$ objdump -d test.o



00000000 <fill>:

   0:    e2511010     subs    r1, r1, #16

   4:    412fff1e     bxmi    lr

   8:    e2511010     subs    r1, r1, #16

   c:    e1c020f0     strd    r2, [r0]

  10:    e1c020f8     strd    r2, [r0, #8]

  14:    e2800010     add    r0, r0, #16

  18:    5afffffa     bpl    8 <fill+0x8>

  1c:    e12fff1e     bx    lr





$ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c

$ objdump -d test.o



00000000 <fill>:

   0:    e351000f     cmp    r1, #15

   4:    d12fff1e     bxle    lr

   8:    e2411010     sub    r1, r1, #16

   c:    e280c010     add    ip, r0, #16

  10:    e3c1100f     bic    r1, r1, #15

  14:    e08c1001     add    r1, ip, r1

  18:    e1c020f0     strd    r2, [r0]

  1c:    e2800010     add    r0, r0, #16

  20:    e14020f8     strd    r2, [r0, #-8]

  24:    e1500001     cmp    r0, r1

  28:    1afffffa     bne    18 <fill+0x18>

  2c:    e12fff1e     bx    lr

Reply via email to