[PATCH 0/5] Tweak IRA handling of tying and earlyclobbers

Richard Sandiford Fri, 21 Jun 2019 06:39:37 -0700

This series of patches tweaks the IRA handling of matched constraints
and earlyclobbers.  The main explanations are in the individual patches.


Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.

I also tried building at least one target per CPU directory and
comparing the effect of the patches on the assembly output for
gcc.c-torture, gcc.dg and g++.dg using -O2 -ftree-vectorize.  The table
below summarises the effect on the number of lines of assembly, ignoring
tests for which the number of lines was the same:

Target                 Tests  Delta   Best  Worst Median
======                 =====  =====   ====  ===== ======
alpha-linux-gnu           87   -126    -96    138     -1
arm-linux-gnueabi         38    -37    -10      4     -1
arm-linux-gnueabihf       38    -37    -10      4     -1
avr-elf                   19    -64    -60     14     -1
bfin-elf                 143    -55    -21     21     -1
c6x-elf                   38    -32     -9     16     -1
cris-elf                 253  -1456   -192     24     -1
csky-elf                 101   -221    -36     26     -1
frv-linux-gnu             11    -23     -8     -1     -1
ft32-elf                   1     -2     -2     -2     -2
hppa64-hp-hpux11.23       66    -24    -12     12     -1
i686-apple-darwin         22    -45    -24     11     -1
i686-pc-linux-gnu         18    -65    -96     40     -1
ia64-linux-gnu             1     -4     -4     -4     -4
m68k-linux-gnu            83     31    -70     18      1
mcore-elf                 26   -122    -38     11     -2
mmix                      29   -110    -25      3     -1
mn10300-elf              399    258    -70     70      1
msp430-elf               120   1363    -13    833      2
pdp11                     37    -90    -92     25     -1
powerpc-ibm-aix7.0        31    -25     -4      3     -1
powerpc64-linux-gnu       31    -26     -2      2     -1
powerpc64le-linux-gnu     31    -26     -2      2     -1
pru-elf                    2      8      1      7      1
riscv32-elf                1     -2     -2     -2     -2
riscv64-elf                1     -2     -2     -2     -2
rl78-elf                   6    -20    -18      9     -3
rx-elf                   123     32    -58     30     -1
s390-linux-gnu             7     16     -6      9      1
s390x-linux-gnu            1     -3     -3     -3     -3
sh-linux-gnu             475  -4696   -843     42     -1
spu-elf                  168   -296   -114     25     -2
visium-elf               214   -936   -183     22     -1
x86_64-darwin             30    -25     -4      2     -1
x86_64-linux-gnu          28    -29     -4      1     -1

Of course, the number of lines is only a very rough guide to code size
and code size is only a very rough guide to performance.  It's just
a way of getting a feel for how invasive the change is in pracitce.

As often with this kind of comparison, quite a few changes in either
direction come from things that the RA doesn't consider, such as the
ability to merge code after RA.

The msp430-elf results are especially misleading.  The port has patterns
like:

;; Alternatives 2 and 3 are to handle cases generated by reload.
(define_insn "subqi3"
  [(set (match_operand:QI           0 "nonimmediate_operand" "=rYs,  rm,  &?r, 
?&r")
        (minus:QI (match_operand:QI 1 "general_operand"       "0,    0,    !r,  
!i")
                  (match_operand:QI 2 "general_operand"      " riYs, rmi, rmi,  
 r")))]
  ""
  "@
  SUB.B\t%2, %0
  SUB%X0.B\t%2, %0
  MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0
  MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0"
)

The patches make more use of the first two (cheap) alternatives
in preference to the third, but sometimes at the cost of introducing
moves elsewhere.  Each alternative counts one line in this test,
but the third alternative is really two instructions.

(If the port does actually want us to prefer the third alternative
over introducing moves, then I think the constraints need to be
changed.  Using "!" heavily disparages the alternative and so
it's reasonable for the optimisers to try hard to avoid it.
If the alternative is actually the preferred way of handling
untied operands then the "?" on operand 0 should be enough.)

The arm-* improvements come from patterns like:

(define_insn_and_split "*negdi2_insn"
  [(set (match_operand:DI         0 "s_register_operand" "=r,&r")
        (neg:DI (match_operand:DI 1 "s_register_operand"  "0,r")))
   (clobber (reg:CC CC_REGNUM))]
  "TARGET_32BIT"

The patches make IRA assign a saving of one full move to ties between
operands 0 and 1, whereas previously it would only assign a saving
of an eigth of a move.

The other big winners (e.g. cris-*, sh-* and visium-*) have similar cases.

I'll post the SVE patches that rely on and test for this later.

Thanks,
Richard

[PATCH 0/5] Tweak IRA handling of tying and earlyclobbers

Reply via email to