On Wed, 2010-12-08 at 13:00 +0000, Andrew Stubbs wrote: > Here is a patch I'd like reviewed for mainline GCC post 4.6. I don't > think it's suitable for stage 3. > > At present, the support for constant loading via immediate operands (as > opposed to constant pools) is not well tuned for Thumb2. There are a few > separate issues: > > * 8-bit immediates can have arbitrary shifts applied, but are > currently limited to even shift offsets. (This appears to be a bug > rather than a deliberate state since half the support was there, but the > other half missing.) > > * Addw / subw support is completely missing. > > * Replicated constants are recognised, but constant splitting never > takes advantage of them. > > * Constants that can be inverted/negated are only identified by very > crude heuristics, sometimes harmful even in the existing code, and not > at all suited to replicated constants. > > My patch addresses all of these issues. Here are some before and after > examples of generated code: > > Example 1: subw > > a - 0xfff > > Before: > sub r0, r0, #4064 ; 0xfe0 > subs r0, r0, #31 ; 0x01f > After: > subw r0, r0, #4095 ; 0xfff > > Example 2: addw > > a + 0xfffff > > Before: > movw r3, #65535 ; 0x0ffff > movt r3, 15 ; 0xf0000 > adds r3, r0, r3 > After: > add r0, r0, #1044480 ; 0xff000 > addw r0, r0, #4095 ; 0x00fff > > Example 3: arbitrary shifts bug fix > > a - 0xfff1 > > Before: > sub r0, r0, #65024 ; 0xfe00 > sub r0, r0, #496 ; 0x01f0 > sub r0, r0, #1 ; 0x0001 > After: > sub r0, r0, #65280 ; 0xff00 > sub r0, r0, #241 ; 0x00f1 > > Example 4: 16-bit replicated patterns > > a + 0x44004401 > > Before: > movw r3, #17409 ; 0x00004401 > movt r3, 17408 ; 0x44000000 > adds r3, r0, r3 > After: > add r0, r0, #1140868096 ; 0x44004400 > adds r0, r0, #1 ; 0x00000001 > > Example 5: 32-bit replicated patterns > > a & 0xaaaaaa00 > > Before: > mov r3, #43520 ; 0x0000aa00 > movt r3, 43690 ; 0xaaaa0000 > and r3, r0, r3 > After: > and r0, r0, #-1431655766 ; 0xaaaaaaaa > bic r0, r0, #170 ; 0x000000aa > > The constant splitting code was duplicated in two places, and I needed > to find a new way to tackle the negated/inverted constants cases > (replicated constants render the old rather dumb heuristics completely > useless), so I have rearranged the code somewhat. Not as much has > changed as it looks like, but hopefully it's a bit easier to maintain > now, and I think it now reliably always chooses the most efficient sense > to encode the constant. > > There is one point in this patch that I am uncertain about: I've renamed > the 'j' constraint to 'ja'. I did this because it allowed me to add jb > and jB which are similar, but different. The 'j' constraint was not > documented anywhere, but I think it is possible it may have been used by > third party code? Is this a problem? > > Is the patch OK to commit, once stage 1 returns? > > There are still a few extra optimizations that might be worth looking at > in future: > * For SET and PLUS operations only, there are some advantages in > splitting constants arithmetically, rather than purely bitwise (this is > not true of non-replicated constants so not relevant to ARM/Thumb1). > - e.g. 0x01010201 = 0x01010101 + 0x00000100 > * 16-bit replicated constants when there isn't an exact match > (similar to the 32-bit constants used in this patch). > * something else? > > Andrew
I'd be happier trying to review this patch if it could be broken down into smaller hunks. One immediate problem that stands out is: (define_insn_and_split "*arm_subsi3_insn" - [(set (match_operand:SI 0 "s_register_operand" "=r,r,rk,r,r") - (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k,?n,r") - (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, r,?n")))] + [(set (match_operand:SI 0 "s_register_operand" "=r,r,rk,r, k, r,r") + (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k, rk,k, ?n,r") + (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, jb,jb,r,?n")))] "TARGET_32BIT" "@ rsb%?\\t%0, %2, %1 sub%?\\t%0, %1, %2 sub%?\\t%0, %1, %2 + subw%?\\t%0, %1, %2 + subw%?\\t%0, %1, %2 # #" "&& ((GET_CODE (operands[1]) == CONST_INT - && !const_ok_for_arm (INTVAL (operands[1]))) + && !(const_ok_for_arm (INTVAL (operands[1])) + || satisfies_constraint_jb (operands[2]))) || (GET_CODE (operands[2]) == CONST_INT - && !const_ok_for_arm (INTVAL (operands[2]))))" + && !(const_ok_for_arm (INTVAL (operands[2])) + || satisfies_constraint_jb (operands[2]))))" [(clobber (const_int 0))] " arm_split_constant (MINUS, SImode, curr_insn, INTVAL (operands[1]), operands[0], operands[2], 0); DONE; " - [(set_attr "length" "4,4,4,16,16") + [(set_attr "length" "4,4,4,4,4,16,16") (set_attr "predicable" "yes")] ) Firstly, MINUS (REG, CONST) is not canonical (the compiler should generate PLUS (REG, -CONST)), so your additional patterns should, in theory, never match. If they are matching, then really we should work out why and then fix those rather than adding more code here... Secondly, these new alternatives seem to be unconditionally enabled. It looks as though you are relying on the constraints never matching when not compiling for Thumb-2 and thus the alternative not being selected. I'd prefer that the alternative was properly disabled when it wasn't available by using the 'enabled' attribute. The same applies to the addition operation. Also, your change to use a double-letter sequence beginning with 'j' means any hand-written inline assembly code using a single 'j' will break (that's a backwards compatibility issue for users); is there really no other letter that can be used to prefix the operation? Also, think very carefully about whether your new constraints should be internal only or really public. I think it would be better to leave 'j' as is, and add some more P-prefixed constraints that are internal only for the additional cases. A final note is that you may have missed some cases. Now that we have movw, reg & ~(16-bit const) can now be done in at most 2 insns: movw t1, #16-bit const bic Rd, reg, t1 On thumb-2 you can also use ORN that way as well. R. R.