Hi, I from time to time get the impression that the inter procedure scratch register r12 (ip) is not used as often as it might on ARM.
Example, compiled with GCC-6.2 for arm966e-s ARM with arm-none-eabi-gcc target: struct data { int flags; }; extern void* func(struct data* dp); struct data* test(struct data* dp) { int saved_flags = dp->flags; struct data *dp2 = func(dp); dp->flags = saved_flags; return dp2; } Small simple function that compiles to (using GCC-6.2 with either -Os or -O2) 00000000 <test>: 0: e92d4070 push {r4, r5, r6, lr} 4: e1a04000 mov r4, r0 8: e5905000 ldr r5, [r0] c: ebfffffe bl 0 <func> 10: e5845000 str r5, [r4] 14: e8bd8070 pop {r4, r5, r6, pc} This short example where a function calls another function, and saves one value in structure, that needs to be restored. I guess its in ABI to keep stack 64bit aligned, but still code won't get optimal, But instead of pushing stuff to stack, the r12 scratch could be used in some cases. Couldn't this be compiled to as the following, with using r12 'ip': 00000000 <test>: 0: xxxxxxxx push {r4, lr} 4: xxxxxxxx mov r4, r0 8: xxxxxxxx ldr ip, [r0] c: xxxxxxxx bl 0 <func> 10: xxxxxxxx str ip, [r4] 14: xxxxxxxx pop {r4, pc} Still stack is 64bit aligned, though its not less instructions, but code should faster since 2 less loads and 2 less stores to (possibly external) memories. I know high-registers r8-r12 is not preferable always with thumb1 or thumb2, but for ARM the penalty is less I think and maybe ip could be used more often? How is cost calculated for ip on ARM, it should in some sense be rather 'cheap' since you dont have to push it to stack for inter procedure calls? Thanks, and Best Regards, Fredrik