This series of patches tweaks the IRA handling of matched constraints and earlyclobbers. The main explanations are in the individual patches.
Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu. I also tried building at least one target per CPU directory and comparing the effect of the patches on the assembly output for gcc.c-torture, gcc.dg and g++.dg using -O2 -ftree-vectorize. The table below summarises the effect on the number of lines of assembly, ignoring tests for which the number of lines was the same: Target Tests Delta Best Worst Median ====== ===== ===== ==== ===== ====== alpha-linux-gnu 87 -126 -96 138 -1 arm-linux-gnueabi 38 -37 -10 4 -1 arm-linux-gnueabihf 38 -37 -10 4 -1 avr-elf 19 -64 -60 14 -1 bfin-elf 143 -55 -21 21 -1 c6x-elf 38 -32 -9 16 -1 cris-elf 253 -1456 -192 24 -1 csky-elf 101 -221 -36 26 -1 frv-linux-gnu 11 -23 -8 -1 -1 ft32-elf 1 -2 -2 -2 -2 hppa64-hp-hpux11.23 66 -24 -12 12 -1 i686-apple-darwin 22 -45 -24 11 -1 i686-pc-linux-gnu 18 -65 -96 40 -1 ia64-linux-gnu 1 -4 -4 -4 -4 m68k-linux-gnu 83 31 -70 18 1 mcore-elf 26 -122 -38 11 -2 mmix 29 -110 -25 3 -1 mn10300-elf 399 258 -70 70 1 msp430-elf 120 1363 -13 833 2 pdp11 37 -90 -92 25 -1 powerpc-ibm-aix7.0 31 -25 -4 3 -1 powerpc64-linux-gnu 31 -26 -2 2 -1 powerpc64le-linux-gnu 31 -26 -2 2 -1 pru-elf 2 8 1 7 1 riscv32-elf 1 -2 -2 -2 -2 riscv64-elf 1 -2 -2 -2 -2 rl78-elf 6 -20 -18 9 -3 rx-elf 123 32 -58 30 -1 s390-linux-gnu 7 16 -6 9 1 s390x-linux-gnu 1 -3 -3 -3 -3 sh-linux-gnu 475 -4696 -843 42 -1 spu-elf 168 -296 -114 25 -2 visium-elf 214 -936 -183 22 -1 x86_64-darwin 30 -25 -4 2 -1 x86_64-linux-gnu 28 -29 -4 1 -1 Of course, the number of lines is only a very rough guide to code size and code size is only a very rough guide to performance. It's just a way of getting a feel for how invasive the change is in pracitce. As often with this kind of comparison, quite a few changes in either direction come from things that the RA doesn't consider, such as the ability to merge code after RA. The msp430-elf results are especially misleading. The port has patterns like: ;; Alternatives 2 and 3 are to handle cases generated by reload. (define_insn "subqi3" [(set (match_operand:QI 0 "nonimmediate_operand" "=rYs, rm, &?r, ?&r") (minus:QI (match_operand:QI 1 "general_operand" "0, 0, !r, !i") (match_operand:QI 2 "general_operand" " riYs, rmi, rmi, r")))] "" "@ SUB.B\t%2, %0 SUB%X0.B\t%2, %0 MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0 MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0" ) The patches make more use of the first two (cheap) alternatives in preference to the third, but sometimes at the cost of introducing moves elsewhere. Each alternative counts one line in this test, but the third alternative is really two instructions. (If the port does actually want us to prefer the third alternative over introducing moves, then I think the constraints need to be changed. Using "!" heavily disparages the alternative and so it's reasonable for the optimisers to try hard to avoid it. If the alternative is actually the preferred way of handling untied operands then the "?" on operand 0 should be enough.) The arm-* improvements come from patterns like: (define_insn_and_split "*negdi2_insn" [(set (match_operand:DI 0 "s_register_operand" "=r,&r") (neg:DI (match_operand:DI 1 "s_register_operand" "0,r"))) (clobber (reg:CC CC_REGNUM))] "TARGET_32BIT" The patches make IRA assign a saving of one full move to ties between operands 0 and 1, whereas previously it would only assign a saving of an eigth of a move. The other big winners (e.g. cris-*, sh-* and visium-*) have similar cases. I'll post the SVE patches that rely on and test for this later. Thanks, Richard