On 07/12/11 14:03, Andrew Stubbs wrote:
> On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:
>> So it looks like the code generated for core registers with thumb2 is
>> pretty rubbish (no real surprise there -- to get the best code you need
>> to make use of the fact that on ARM a shift by a small negative number
>> (<  -128) will give zero.  This gives us sequences like:
>>
>> For ARM state it's something like (untested)
>>
>>                                      @ shft<  32                     , 
>> shft>= 32
>> __ashldi3_v3:
>>      sub     r3, r2, #32             @ -ve                           , shft 
>> - 32
>>      lsl     ah, ah, r2              @ ah<<  shft                    , 0
>>      rsb     ip, r2, #32             @ 32 - shft                     , -ve
>>      orr     ah, ah, al, lsl r3      @ ah<<  shft                    , al<<  
>> shft - 32
>>      orr     ah, ah, al, lsr ip      @ ah<<  shft | al>>  32 - shft  , al<<  
>> shft - 32
>>      lsl     al, al, r2              @ al<<  shft                    , 0
>>
>> For Thumb2 (where there is no orr with register shift)
>>
>>      lsls    ah, ah, r2              @ ah<<  shft                    , 0
>>      sub     r3, r2, #32             @ -ve                           , shft 
>> - 32
>>      lsl     ip, al, r3              @ 0                             , al<<  
>> shft - 32
>>      negs    r3, r3                  @ 32 - shft                     , -ve
>>      orr     ah, ah, ip              @ ah<<  shft                    , al<<  
>> shft - 32
>>      lsr     r3, al, r3              @ al>>  32 - shft               , 0
>>      orrs    ah, ah, r3              @ ah<<  shft | al>>  32 - shft  , al<<  
>> shft - 32
>>      lsls    al, al, r2              @ al<<  shft                    , 0
>>
>> Neither of which needs the condition flags during execution (and indeed
>> is probably better in both cases than the code currently in lib1funcs.asm
>> for a modern core).  The flag clobbering behaviour in the thumb2 variant
>> is only for code size saving; that would normally be added by a late
>> optimization pass.
> 
> OK, those are interesting, and I can look into making it happen, with or 
> without NEON.
> 
> Would it not require an unspec to prevent 'clever things' happening to 
> the negative shift, if we were to encode these in the machine 
> description? I'm not too clear on what these 'clever things' might be in 
> the case of shift-by-register (presumably value-range propagation is 
> one), but I know the NEON shifts are encoded this way for safety.
> 

Given the way the shift patterns in the compiler are written today, quite 
possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to 
safely make any
assumptions that would break things.

I suspect that the shift patterns should really be changed to make the shift be 
by a QImode value;
this would then correctly describe the number of bits in the register that are 
really involved in
the shift.  Further, we could then say that, for core registers, the full value 
in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this 
though.

>> None of this directly helps with your neon usage, but it does show that we
>> really don't need to clobber the condition code register to get an
>> efficient sequence.
> 
> Except that it doesn't in the case of a shift by one where there is a 
> two-instruction sequence that clobbers CC. Presumably this special case 
> can be treated differently though, right from expand.
> 

All of the sequences above can be simplified significantly if the shift amount 
is constant and I
think then, that with the exception of the special case you mention (which is 
only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM 
instructions:

shifts < 32

LSL     AH, AH, #n
ORR     AH, AH, AL, LSR #(32 - n)
LSL     AL, AL, #n

shifts >= 32
LSL     AH, AL, #(n - 32)
MOV     AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost 
the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle 
the common case where
the condition code registers were known to be dead).

R.

Reply via email to