Re: [PATCH][ARM] Thumb2 constant loading optimization

Richard Earnshaw Tue, 12 Apr 2011 03:02:16 -0700

On Wed, 2010-12-08 at 13:00 +0000, Andrew Stubbs wrote:
> Here is a patch I'd like reviewed for mainline GCC post 4.6. I don't 
> think it's suitable for stage 3.
> 
> At present, the support for constant loading via immediate operands (as 
> opposed to constant pools) is not well tuned for Thumb2. There are a few 
> separate issues:
> 
>     * 8-bit immediates can have arbitrary shifts applied, but are 
> currently limited to even shift offsets. (This appears to be a bug 
> rather than a deliberate state since half the support was there, but the 
> other half missing.)
> 
>     * Addw / subw support is completely missing.
> 
>     * Replicated constants are recognised, but constant splitting never 
> takes advantage of them.
> 
>     * Constants that can be inverted/negated are only identified by very 
> crude heuristics, sometimes harmful even in the existing code, and not 
> at all suited to replicated constants.
> 
> My patch addresses all of these issues. Here are some before and after 
> examples of generated code:
> 
> Example 1: subw
> 
>     a - 0xfff
> 
>     Before:
>           sub     r0, r0, #4064 ; 0xfe0
>           subs    r0, r0, #31   ; 0x01f
>     After:
>           subw    r0, r0, #4095 ; 0xfff
> 
> Example 2: addw
> 
>     a + 0xfffff
> 
>     Before:
>           movw    r3, #65535       ; 0x0ffff
>           movt    r3, 15           ; 0xf0000
>           adds    r3, r0, r3
>     After:
>           add     r0, r0, #1044480 ; 0xff000
>           addw    r0, r0, #4095    ; 0x00fff
> 
> Example 3: arbitrary shifts bug fix
> 
>     a - 0xfff1
> 
>     Before:
>           sub     r0, r0, #65024   ; 0xfe00
>           sub     r0, r0, #496     ; 0x01f0
>           sub     r0, r0, #1       ; 0x0001
>     After:
>           sub     r0, r0, #65280   ; 0xff00
>           sub     r0, r0, #241     ; 0x00f1
> 
> Example 4: 16-bit replicated patterns
> 
>     a + 0x44004401
> 
>     Before:
>           movw    r3, #17409          ; 0x00004401
>           movt    r3, 17408           ; 0x44000000
>           adds    r3, r0, r3
>     After:
>           add     r0, r0, #1140868096 ; 0x44004400
>           adds    r0, r0, #1          ; 0x00000001
> 
> Example 5: 32-bit replicated patterns
> 
>     a & 0xaaaaaa00
> 
>     Before:
>           mov     r3, #43520           ; 0x0000aa00
>           movt    r3, 43690            ; 0xaaaa0000
>           and     r3, r0, r3
>     After:
>           and     r0, r0, #-1431655766 ; 0xaaaaaaaa
>           bic     r0, r0, #170         ; 0x000000aa
> 
> The constant splitting code was duplicated in two places, and I needed 
> to find a new way to tackle the negated/inverted constants cases 
> (replicated constants render the old rather dumb heuristics completely 
> useless), so I have rearranged the code somewhat. Not as much has 
> changed as it looks like, but hopefully it's a bit easier to maintain 
> now, and I think it now reliably always chooses the most efficient sense 
> to encode the constant.
> 
> There is one point in this patch that I am uncertain about: I've renamed 
> the 'j' constraint to 'ja'. I did this because it allowed me to add jb 
> and jB which are similar, but different. The 'j' constraint was not 
> documented anywhere, but I think it is possible it may have been used by 
> third party code? Is this a problem?
> 
> Is the patch OK to commit, once stage 1 returns?
> 
> There are still a few extra optimizations that might be worth looking at 
> in future:
>     * For SET and PLUS operations only, there are some advantages in 
> splitting constants arithmetically, rather than purely bitwise (this is 
> not true of non-replicated constants so not relevant to ARM/Thumb1).
>       - e.g. 0x01010201 = 0x01010101 + 0x00000100
>     * 16-bit replicated constants when there isn't an exact match 
> (similar to the 32-bit constants used in this patch).
>     * something else?
> 
> Andrew



I'd be happier trying to review this patch if it could be broken down
into smaller hunks.  One immediate problem that stands out is:

 (define_insn_and_split "*arm_subsi3_insn"
-  [(set (match_operand:SI           0 "s_register_operand" "=r,r,rk,r,r")
-       (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k,?n,r")
-                 (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, r,?n")))]
+  [(set (match_operand:SI           0 "s_register_operand" "=r,r,rk,r, k, r,r")
+       (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k, rk,k, ?n,r")
+                 (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, 
jb,jb,r,?n")))]
   "TARGET_32BIT"
   "@
    rsb%?\\t%0, %2, %1
    sub%?\\t%0, %1, %2
    sub%?\\t%0, %1, %2
+   subw%?\\t%0, %1, %2
+   subw%?\\t%0, %1, %2
    #
    #"
   "&& ((GET_CODE (operands[1]) == CONST_INT
-               && !const_ok_for_arm (INTVAL (operands[1])))
+               && !(const_ok_for_arm (INTVAL (operands[1]))
+            || satisfies_constraint_jb (operands[2])))
        || (GET_CODE (operands[2]) == CONST_INT
-          && !const_ok_for_arm (INTVAL (operands[2]))))"
+          && !(const_ok_for_arm (INTVAL (operands[2]))
+               || satisfies_constraint_jb (operands[2]))))"
   [(clobber (const_int 0))]
   "
   arm_split_constant (MINUS, SImode, curr_insn,
                       INTVAL (operands[1]), operands[0], operands[2], 0);
   DONE;
   "
-  [(set_attr "length" "4,4,4,16,16")
+  [(set_attr "length" "4,4,4,4,4,16,16")
    (set_attr "predicable" "yes")]
 )
 

Firstly, MINUS (REG, CONST) is not canonical (the compiler should
generate PLUS (REG, -CONST)), so your additional patterns should, in
theory, never match.  If they are matching, then really we should work
out why and then fix those rather than adding more code here...

Secondly, these new alternatives seem to be unconditionally enabled.  It
looks as though you are relying on the constraints never matching when
not compiling for Thumb-2 and thus the alternative not being selected.
I'd prefer that the alternative was properly disabled when it wasn't
available by using the 'enabled' attribute.  The same applies to the
addition operation.

Also, your change to use a double-letter sequence beginning with 'j'
means any hand-written inline assembly code using a single 'j' will
break (that's a backwards compatibility issue for users); is there
really no other letter that can be used to prefix the operation?  Also,
think very carefully about whether your new constraints should be
internal only or really public.  I think it would be better to leave 'j'
as is, and add some more P-prefixed constraints that are internal only
for the additional cases.

A final note is that you may have missed some cases.  Now that we have
movw, 
        reg & ~(16-bit const)
can now be done in at most 2 insns:
        movw t1, #16-bit const
        bic  Rd, reg, t1

On thumb-2 you can also use ORN that way as well.

R.

R.

Re: [PATCH][ARM] Thumb2 constant loading optimization

Reply via email to