IRA's make_early_clobber_and_input_conflicts checks for cases in which an output operand is likely to be an earlyclobber and an input operand is unlikely to be tieable with it. If so, the allocno for the output conflicts with the allocno for the input. This seems to work well.
However, a similar situation arises if an output operand is likely to be tied to one of a set of input operands X and if another input operand has a different value from all of the operands in X. E.g. if we have: 0: "=r, r" 1: "0, r" 2: "r, 0" 3: "r, r" operand 0 will always be tied to operand 1 or operand 2, so if operand 3 is different from them both, operand 0 acts like an earlyclobber as far as operand 3 (only) is concerned. The same is true for operand 2 in: 0: "=r" 1: "0" 2: "r" In the second example, we'd normally have a copy between operand 1 and operand 0 if operand 1 dies in the instruction, and so there's rarely a problem. But if operand 1 doesn't die in the instruction, operand 0 still acts as an earlyclobber for operand 2 (if different from operand 1), since in that case LRA must copy operand 1 to operand 0 before the instruction. As the existing comment says: Avoid introducing unnecessary conflicts by checking classes of the constraints and pseudos because otherwise significant code degradation is possible for some targets. I think that's doubly true here. E.g. it's perfectly reasonable to have constraints like: 0: "=r, r" 1: "0, r" 2: "r, r" on targets like s390 that have shorter instructions for tied operands, but that don't want the size difference to influence RA too much. We shouldn't treat operand 0 as earlyclobber wrt operand 2 in that case. This patch therefore treats a normal tied non-earlyclobber output as being effectively earlyclobber wrt to an input if it is so for *all* preferred alternatives. My usual bogo-comparison of gcc.c-torture, gcc.dg and g++.dg (this time using -Os -fno-schedule-insns{,2}) gives: Target Tests Delta Best Worst Median ====== ===== ===== ==== ===== ====== aarch64-linux-gnu 3 -3 -1 -1 -1 aarch64_be-linux-gnu 4 -4 -1 -1 -1 alpha-linux-gnu 136 -190 -56 84 -1 arc-elf 31 -172 -27 3 -2 arm-linux-gnueabi 59 -996 -136 4 -1 arm-linux-gnueabihf 59 -996 -136 4 -1 bfin-elf 22 -31 -19 8 -1 bpf-elf 276 -388 -191 12 -1 cris-elf 73 69 -18 26 -1 epiphany-elf 58 -91 -10 2 -1 fr30-elf 123 -156 -33 20 -1 h8300-elf 150 -426 -36 17 -2 hppa64-hp-hpux11.23 39 -65 -16 1 -1 i686-apple-darwin 93 -51 -29 26 -1 i686-pc-linux-gnu 43 8 -10 27 -1 m32r-elf 68 -92 -31 14 -1 m68k-linux-gnu 169 -65 -23 33 -1 mcore-elf 27 -29 -14 8 -1 mmix 25 -75 -28 2 -1 mn10300-elf 166 32 -46 149 -1 moxie-rtems 937 1461 -1649 6000 -1 msp430-elf 89 -1364 -835 5 -4 nds32le-elf 34 -54 -29 2 -1 pdp11 252 -458 -23 13 -1 powerpc-ibm-aix7.0 3 -4 -2 -1 -1 powerpc64-linux-gnu 1 -1 -1 -1 -1 powerpc64le-linux-gnu 3 -3 -1 -1 -1 rl78-elf 4 -12 -4 -2 -4 rx-elf 59 -99 -11 2 -1 s390-linux-gnu 115 -117 -53 21 -1 s390x-linux-gnu 120 -47 -25 21 -1 sh-linux-gnu 54 -89 -31 8 -1 sparc64-linux-gnu 14 -6 -5 4 -1 tilepro-linux-gnu 209 -452 -55 16 -1 v850-elf 10 18 -2 21 -1 vax-netbsdelf 5 -5 -1 -1 -1 x86_64-darwin 53 -62 -33 3 -1 x86_64-linux-gnu 52 -8 -8 13 -1 xstormy16-elf 144 -814 -541 25 -1 xtensa-elf 578 -2096 -138 15 -1 The eye-watering moxie-rtems +6,000 outlier is from gcc.dg/pr59992.c, where the same code template is instantiated 10,000 times. In some instances the code improves by one instruction, but it regresses by one instruction in many more. To emphasise that this is a poor metric (and is just to get a flavour), most of the {i686,x86_64}-linux-gnu LOC increases happen in frame info rather than code. I should try to script that out... Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard 2019-09-30 Richard Sandiford <richard.sandif...@arm.com> gcc/ * ira-lives.c (check_and_make_def_conflict): Handle cases in which DEF is not a true earlyclobber but is tied to a specific input operand, and so is effectively earlyclobber wrt inputs that have different values. (make_early_clobber_and_input_conflicts): Pass this case to the above. Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c 2019-09-26 08:37:44.000000000 +0100 +++ gcc/ira-lives.c 2019-09-30 15:23:22.629428359 +0100 @@ -632,9 +632,28 @@ check_and_make_def_use_conflict (rtx dre /* Check and make if necessary conflicts for definition DEF of class DEF_CL of the current insn with input operands. Process only - constraints of alternative ALT. */ + constraints of alternative ALT. + + One of three things is true when this function is called: + + (1) DEF is an earlyclobber for alternative ALT. Input operands then + conflict with DEF in ALT unless they explicitly match DEF via 0-9 + constraints. + + (2) DEF matches (via 0-9 constraints) an operand that is an + earlyclobber for alternative ALT. Other input operands then + conflict with DEF in ALT. + + (3) [FOR_TIE_P] Some input operand X matches DEF for alternative ALT. + Input operands with a different value from X then conflict with + DEF in ALT. + + However, there's still a judgement call to make when deciding + whether a conflict in ALT is important enough to be reflected + in the pan-alternative allocno conflict set. */ static void -check_and_make_def_conflict (int alt, int def, enum reg_class def_cl) +check_and_make_def_conflict (int alt, int def, enum reg_class def_cl, + bool for_tie_p) { int use, use_match; ira_allocno_t a; @@ -668,14 +687,40 @@ check_and_make_def_conflict (int alt, in if (use == def || recog_data.operand_type[use] == OP_OUT) continue; + /* An earlyclobber on DEF doesn't apply to an input operand X if X + explicitly matches DEF, but it applies to other input operands + even if they happen to be the same value as X. + + In contrast, if an input operand X is tied to a non-earlyclobber + DEF, there's no conflict with other input operands that have the + same value as X. */ + if (op_alt[use].matches == def + || (for_tie_p + && rtx_equal_p (recog_data.operand[use], + recog_data.operand[op_alt[def].matched]))) + continue; + if (op_alt[use].anything_ok) use_cl = ALL_REGS; else use_cl = op_alt[use].cl; + if (use_cl == NO_REGS) + continue; + + /* If DEF is simply a tied operand, ignore cases in which this + alternative requires USE to have a likely-spilled class. + Adding a conflict would just constrain USE further if DEF + happens to be allocated first. */ + if (for_tie_p && targetm.class_likely_spilled_p (use_cl)) + continue; /* If there's any alternative that allows USE to match DEF, do not record a conflict. If that causes us to create an invalid - instruction due to the earlyclobber, reload must fix it up. */ + instruction due to the earlyclobber, reload must fix it up. + + Likewise, if we're treating a tied DEF like a partial earlyclobber, + do not record a conflict if there's another alternative in which + DEF is neither tied nor earlyclobber. */ for (alt1 = 0; alt1 < recog_data.n_alternatives; alt1++) { if (!TEST_BIT (preferred_alternatives, alt1)) @@ -690,6 +735,12 @@ check_and_make_def_conflict (int alt, in && recog_data.constraints[use - 1][0] == '%' && op_alt1[use - 1].matches == def)) break; + if (for_tie_p + && !op_alt1[def].earlyclobber + && op_alt1[def].matched < 0 + && alternative_class (op_alt1, def) != NO_REGS + && alternative_class (op_alt1, use) != NO_REGS) + break; } if (alt1 < recog_data.n_alternatives) @@ -700,8 +751,7 @@ check_and_make_def_conflict (int alt, in if ((use_match = op_alt[use].matches) >= 0) { - if (use_match == def) - continue; + gcc_checking_assert (use_match != def); if (op_alt[use_match].anything_ok) use_cl = ALL_REGS; @@ -716,7 +766,11 @@ check_and_make_def_conflict (int alt, in /* Make conflicts of early clobber pseudo registers of the current insn with its inputs. Avoid introducing unnecessary conflicts by checking classes of the constraints and pseudos because otherwise - significant code degradation is possible for some targets. */ + significant code degradation is possible for some targets. + + For these purposes, tying an input to an output makes that output act + like an earlyclobber for inputs with a different value, since the output + register then has a predetermined purpose on input to the instruction. */ static void make_early_clobber_and_input_conflicts (void) { @@ -731,15 +785,19 @@ make_early_clobber_and_input_conflicts ( if (TEST_BIT (preferred_alternatives, alt)) for (def = 0; def < n_operands; def++) { - def_cl = NO_REGS; - if (op_alt[def].earlyclobber) + if (op_alt[def].anything_ok) + def_cl = ALL_REGS; + else + def_cl = op_alt[def].cl; + if (def_cl != NO_REGS) { - if (op_alt[def].anything_ok) - def_cl = ALL_REGS; - else - def_cl = op_alt[def].cl; - check_and_make_def_conflict (alt, def, def_cl); + if (op_alt[def].earlyclobber) + check_and_make_def_conflict (alt, def, def_cl, false); + else if (op_alt[def].matched >= 0 + && !targetm.class_likely_spilled_p (def_cl)) + check_and_make_def_conflict (alt, def, def_cl, true); } + if ((def_match = op_alt[def].matches) >= 0 && (op_alt[def_match].earlyclobber || op_alt[def].earlyclobber)) @@ -748,7 +806,7 @@ make_early_clobber_and_input_conflicts ( def_cl = ALL_REGS; else def_cl = op_alt[def_match].cl; - check_and_make_def_conflict (alt, def, def_cl); + check_and_make_def_conflict (alt, def, def_cl, false); } } }