Re: [PATCH][ARM] Thumb2 constant loading optimization

Andrew Stubbs Tue, 12 Apr 2011 02:22:25 -0700

Ping.

On 08/12/10 13:00, Andrew Stubbs wrote:

Here is a patch I'd like reviewed for mainline GCC post 4.6. I don't
think it's suitable for stage 3.


At present, the support for constant loading via immediate operands (as
opposed to constant pools) is not well tuned for Thumb2. There are a few
separate issues:

* 8-bit immediates can have arbitrary shifts applied, but are currently
limited to even shift offsets. (This appears to be a bug rather than a
deliberate state since half the support was there, but the other half
missing.)

* Addw / subw support is completely missing.

* Replicated constants are recognised, but constant splitting never
takes advantage of them.

* Constants that can be inverted/negated are only identified by very
crude heuristics, sometimes harmful even in the existing code, and not
at all suited to replicated constants.

My patch addresses all of these issues. Here are some before and after
examples of generated code:

Example 1: subw

a - 0xfff

Before:
sub r0, r0, #4064 ; 0xfe0
subs r0, r0, #31 ; 0x01f
After:
subw r0, r0, #4095 ; 0xfff

Example 2: addw

a + 0xfffff

Before:
movw r3, #65535 ; 0x0ffff
movt r3, 15 ; 0xf0000
adds r3, r0, r3
After:
add r0, r0, #1044480 ; 0xff000
addw r0, r0, #4095 ; 0x00fff

Example 3: arbitrary shifts bug fix

a - 0xfff1

Before:
sub r0, r0, #65024 ; 0xfe00
sub r0, r0, #496 ; 0x01f0
sub r0, r0, #1 ; 0x0001
After:
sub r0, r0, #65280 ; 0xff00
sub r0, r0, #241 ; 0x00f1

Example 4: 16-bit replicated patterns

a + 0x44004401

Before:
movw r3, #17409 ; 0x00004401
movt r3, 17408 ; 0x44000000
adds r3, r0, r3
After:
add r0, r0, #1140868096 ; 0x44004400
adds r0, r0, #1 ; 0x00000001

Example 5: 32-bit replicated patterns

a & 0xaaaaaa00

Before:
mov r3, #43520 ; 0x0000aa00
movt r3, 43690 ; 0xaaaa0000
and r3, r0, r3
After:
and r0, r0, #-1431655766 ; 0xaaaaaaaa
bic r0, r0, #170 ; 0x000000aa

The constant splitting code was duplicated in two places, and I needed
to find a new way to tackle the negated/inverted constants cases
(replicated constants render the old rather dumb heuristics completely
useless), so I have rearranged the code somewhat. Not as much has
changed as it looks like, but hopefully it's a bit easier to maintain
now, and I think it now reliably always chooses the most efficient sense
to encode the constant.

There is one point in this patch that I am uncertain about: I've renamed
the 'j' constraint to 'ja'. I did this because it allowed me to add jb
and jB which are similar, but different. The 'j' constraint was not
documented anywhere, but I think it is possible it may have been used by
third party code? Is this a problem?

Is the patch OK to commit, once stage 1 returns?

There are still a few extra optimizations that might be worth looking at
in future:
* For SET and PLUS operations only, there are some advantages in
splitting constants arithmetically, rather than purely bitwise (this is
not true of non-replicated constants so not relevant to ARM/Thumb1).
- e.g. 0x01010201 = 0x01010101 + 0x00000100
* 16-bit replicated constants when there isn't an exact match (similar
to the 32-bit constants used in this patch).
* something else?

Andrew

Re: [PATCH][ARM] Thumb2 constant loading optimization

Reply via email to