On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw <richard.earns...@foss.arm.com> wrote: > Not quite, ARM state still has more flexible addressing modes for > unsigned byte loads than for signed byte loads. It's even worse with > thumb1 where some signed loads have no single-register addressing mode > (ie you have to copy zero into another register to use as an index > before doing the load).
I wasn't aware of the load address problem. That was something I hadn't considered, and will have to look at that. Load is just one instruction though. For most other instructions, a zero-extend results in less efficient code, because it then forces a sign-extend before a signed operation. The fact that parameters and locals are handled differently which requires conversions when copying between them results in more inefficient code. And changing TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be unwise, so changing PROMOTE_MODE is the safer option. Consider this testcase extern signed short gs; short sub (void) { signed short s = gs; int i; for (i = 0; i < 10; i++) { s += 1; if (s > 10) break; } return s; } The inner loop ends up as .L3: adds r3, r3, #1 mov r0, r1 uxth r3, r3 sxth r2, r3 cmp r2, #10 bgt .L8 cmp r2, r1 bne .L3 bx lr We need the sign-extension for the compare. We need the zero-extension for the loop carried dependency. We have two extensions in every loop iteration, plus some extra register usage and register movement. We get better code for this example if we aren't forcing signed shorts to be zero-extended via PROMOTE_MODE. The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does look like a problem though. But this means the difference between generating movs r2, #0 ldrsh r3, [r3, r2] with my patch, or ldrh r3, [r3] lsls r2, r3, #16 asrs r2, r2, #16 without my patch. It isn't clear which sequence is better. The sign-extends in the second sequence can sometimes be optimized away, and sometimes they can't be optimized away. Similarly, in the first sequence, loading zero into a reg can sometimes be optimized, and sometimes it can't. There is also no guarantee that you get the first sequence with the patch or the second sequence without the patch. There is a splitter for ldrsh, so you can get the second pattern sometimes with the patch. Similarly, it might be possible to get the first pattern without the patch in some cases, though I don't have one at the moment. Jim