On 27/01/12 16:07, Andrew Stubbs wrote: > Hi all, > > This patch introduces a new, more efficient set of DImode shift > sequences for values stored in core-registers (as opposed to VFP/NEON > registers). > > The new sequences take advantage of knowledge of what the ARM > instructions do with out-of-range shift amounts. > > The following are examples or a simple test case, like this one: > > long long > f (long long *a, int b) > { > return *a << b; > } > > > In ARM mode, old left-shift vs. the new one: > > stmfd sp!, {r4, r5} | ldrd r2, [r0] > rsb r4, r1, #32 | mov ip, r1 > ldr r5, [r0, #4] | stmfd sp!, {r4, r5} > subs ip, r1, #32 | sub r5, ip, #32 > ldr r0, [r0, #0] | rsb r4, ip, #32 > mov r3, r5, asl r1 | mov r1, r3, asl ip > orr r3, r3, r0, lsr r4 | mov r0, r2, asl ip > mov r2, r0, asl r1 | orr r1, r1, r2, asl r5 > movpl r3, r0, asl ip | orr r1, r1, r2, lsr r4 > mov r0, r2 | ldmfd sp!, {r4, r5} > mov r1, r3 | bx lr > ldmfd sp!, {r4, r5} | > bx lr | > > In Thumb mode, old left-shift vs. new: > > ldr r2, [r0, #0] | ldrd r2, [r0] > ldr r3, [r0, #4] | push {r4, r5, r6} > push {r4, r5, r6} | sub r6, r1, #32 > rsb r6, r1, #32 | mov r4, r1 > sub r4, r1, #32 | rsb r5, r1, #32 > lsls r3, r3, r1 | lsls r6, r2, r6 > lsrs r6, r2, r6 | lsls r1, r3, r1 > lsls r5, r2, r4 | lsrs r5, r2, r5 > orrs r3, r3, r6 | lsls r0, r2, r4 > lsls r0, r2, r1 | orrs r1, r1, r6 > bics r1, r5, r4, asr #32 | orrs r1, r1, r5 > it cs | pop {r4, r5, r6} > movcs r1, r3 | bx lr > pop {r4, r5, r6} | > bx lr | > > Logical right shift is essentially the same sequence as the left shift > above. However, arithmetic right shift requires something slightly > different. Here it is in ARM mode, old vs. new: > > stmfd sp!, {r4, r5} | ldrd r2, [r0] > rsb r4, r1, #32 | mov ip, r1 > ldr r5, [r0, #0] | stmfd sp!, {r4, r5} > subs ip, r1, #32 | rsb r5, ip, #32 > ldr r0, [r0, #4] | subs r4, ip, #32 > mov r2, r5, lsr r1 | mov r0, r2, lsr ip > orr r2, r2, r0, asl r4 | mov r1, r3, asr ip > mov r3, r0, asr r1 | orr r0, r0, r3, asl r5 > movpl r2, r0, asr ip | orrge r0, r0, r3, asr r4 > mov r1, r3 | ldmfd sp!, {r4, r5} > mov r0, r2 | bx lr > ldmfd sp!, {r4, r5} | > bx lr | > > I won't bore you with the Thumb mode comparison. > > The shift-by-constant cases have also been reimplemented, although the > resultant sequences are much the same as before. (Doing this isn't > strictly necessary just yet, but when I post my next patch to do 64-bit > shifts in NEON, this feature will be required by the fall-back > alternatives.) > > I've run a regression test on a cross-compiler, and I should have native > test results next week sometime. Also some benchmark results. > > Is this OK for stage 1? >
What's the impact of this on -Os? At present we fall back to the libcalls, but I can't immediately see how the new code would do that. Gut feeling is that shift by a constant is always worth inlining at -Os, but shift by a register isn't. R. >