Hi,

I from time to time get the impression that the inter procedure scratch 
register r12 (ip) is not used as often as it might on ARM.

Example, compiled with GCC-6.2 for arm966e-s ARM with arm-none-eabi-gcc target:

struct data {
  int flags;
};

extern void* func(struct data* dp);

struct data* test(struct data* dp)
{
  int saved_flags = dp->flags;
  struct data *dp2 = func(dp);
  dp->flags = saved_flags;
  return dp2;
}


Small simple function that compiles to (using GCC-6.2 with either -Os or -O2)

00000000 <test>:
   0:   e92d4070        push    {r4, r5, r6, lr}
   4:   e1a04000        mov     r4, r0
   8:   e5905000        ldr     r5, [r0]
   c:   ebfffffe        bl      0 <func>
  10:   e5845000        str     r5, [r4]
  14:   e8bd8070        pop     {r4, r5, r6, pc}


This short example where a function calls another function, and saves one value 
in structure, that needs to be restored.

I guess its in ABI to keep stack 64bit aligned, but still code won't get 
optimal,
But instead of pushing stuff to stack, the r12 scratch could be used in some 
cases.

Couldn't this be compiled to as the following, with using r12 'ip':

00000000 <test>:
   0:   xxxxxxxx        push    {r4, lr}
   4:   xxxxxxxx        mov     r4, r0
   8:   xxxxxxxx        ldr     ip, [r0]
   c:   xxxxxxxx        bl      0 <func>
  10:   xxxxxxxx        str     ip, [r4]
  14:   xxxxxxxx        pop     {r4, pc}

Still stack is 64bit aligned, though its not less instructions,
but code should faster since 2 less loads and 2 less stores to (possibly 
external) memories.

I know high-registers r8-r12 is not preferable always with thumb1 or thumb2,
but for ARM the penalty is less I think and maybe ip could be used more often?

How is cost calculated for ip on ARM, it should in some sense be rather 'cheap' 
since you dont have to push it to stack for inter procedure calls?

Thanks, and Best Regards,
Fredrik

Reply via email to