Re: [PATCH][ARM] Improve 64-bit shifts (non-NEON)

Richard Earnshaw Mon, 30 Jan 2012 07:26:28 -0800

On 27/01/12 16:07, Andrew Stubbs wrote:
> Hi all,
> 
> This patch introduces a new, more efficient set of DImode shift 
> sequences for values stored in core-registers (as opposed to VFP/NEON 
> registers).
> 
> The new sequences take advantage of knowledge of what the ARM 
> instructions do with out-of-range shift amounts.
> 
> The following are examples or a simple test case, like this one:
> 
> long long
> f (long long *a, int b)
> {
>    return *a << b;
> }
> 
> 
> In ARM mode, old left-shift vs. the new one:
> 
>      stmfd   sp!, {r4, r5}        | ldrd    r2, [r0]
>      rsb     r4, r1, #32          | mov     ip, r1
>      ldr     r5, [r0, #4]         | stmfd   sp!, {r4, r5}
>      subs    ip, r1, #32          | sub     r5, ip, #32
>      ldr     r0, [r0, #0]         | rsb     r4, ip, #32
>      mov     r3, r5, asl r1       | mov     r1, r3, asl ip
>      orr     r3, r3, r0, lsr r4   | mov     r0, r2, asl ip
>      mov     r2, r0, asl r1       | orr     r1, r1, r2, asl r5
>      movpl   r3, r0, asl ip       | orr     r1, r1, r2, lsr r4
>      mov     r0, r2               | ldmfd   sp!, {r4, r5}
>      mov     r1, r3               | bx      lr
>      ldmfd   sp!, {r4, r5}        |
>      bx      lr                   |
> 
> In Thumb mode, old left-shift vs. new:
> 
>      ldr     r2, [r0, #0]         | ldrd    r2, [r0]
>      ldr     r3, [r0, #4]         | push    {r4, r5, r6}
>      push    {r4, r5, r6}         | sub     r6, r1, #32
>      rsb     r6, r1, #32          | mov     r4, r1
>      sub     r4, r1, #32          | rsb     r5, r1, #32
>      lsls    r3, r3, r1           | lsls    r6, r2, r6
>      lsrs    r6, r2, r6           | lsls    r1, r3, r1
>      lsls    r5, r2, r4           | lsrs    r5, r2, r5
>      orrs    r3, r3, r6           | lsls    r0, r2, r4
>      lsls    r0, r2, r1           | orrs    r1, r1, r6
>      bics    r1, r5, r4, asr #32  | orrs    r1, r1, r5
>      it      cs                   | pop     {r4, r5, r6}
>      movcs   r1, r3               | bx      lr
>      pop     {r4, r5, r6}         |
>      bx      lr                   |
> 
> Logical right shift is essentially the same sequence as the left shift 
> above. However, arithmetic right shift requires something slightly 
> different. Here it is in ARM mode, old vs. new:
> 
>      stmfd   sp!, {r4, r5}        | ldrd    r2, [r0]
>      rsb     r4, r1, #32          | mov     ip, r1
>      ldr     r5, [r0, #0]         | stmfd   sp!, {r4, r5}
>      subs    ip, r1, #32          | rsb     r5, ip, #32
>      ldr     r0, [r0, #4]         | subs    r4, ip, #32
>      mov     r2, r5, lsr r1       | mov     r0, r2, lsr ip
>      orr     r2, r2, r0, asl r4   | mov     r1, r3, asr ip
>      mov     r3, r0, asr r1       | orr     r0, r0, r3, asl r5
>      movpl   r2, r0, asr ip       | orrge   r0, r0, r3, asr r4
>      mov     r1, r3               | ldmfd   sp!, {r4, r5}
>      mov     r0, r2               | bx      lr
>      ldmfd   sp!, {r4, r5}        |
>      bx      lr                   |
> 
> I won't bore you with the Thumb mode comparison.
> 
> The shift-by-constant cases have also been reimplemented, although the 
> resultant sequences are much the same as before. (Doing this isn't 
> strictly necessary just yet, but when I post my next patch to do 64-bit 
> shifts in NEON, this feature will be required by the fall-back 
> alternatives.)
> 
> I've run a regression test on a cross-compiler, and I should have native 
> test results next week sometime. Also some benchmark results.
> 
> Is this OK for stage 1?
>


What's the impact of this on -Os?  At present we fall back to the
libcalls, but I can't immediately see how the new code would do that.

Gut feeling is that shift by a constant is always worth inlining at -Os,
but shift by a register isn't.

R.


>

Re: [PATCH][ARM] Improve 64-bit shifts (non-NEON)

Reply via email to