Alex Coplan <alex.cop...@arm.com> writes: > Hello, > > Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a > canonicalization from mult to shift on address reloads, a missing > pattern in the AArch64 backend was exposed. > > In particular, we were missing the ashift variant of > *add_<optab><mode>_multp2 (this mult variant is redundant and was > removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8). > > This patch adds that missing pattern (fixing PR96998), updates > aarch64_is_extend_from_extract() to work for the shift pattern (instead > of the redundant mult variant) and updates callers in the cost > calculations to apply the costing for the shift variant instead.
Hmm. I think we should instead fix this in combine. For the related testcase: void g(void) { int g; for (;; g = h()) { asm volatile ("" :: "r"(&d.e[g])); } } combine tries: Trying 9 -> 10: 9: r99:DI=sign_extend(r93:SI)<<0x2 REG_DEAD r93:SI 10: r100:DI=r96:DI+r99:DI REG_DEAD r99:DI Successfully matched this instruction: (set (reg:DI 100) (plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 93 [ g ])) (const_int 2 [0x2])) (reg/f:DI 96))) allowing combination of insns 9 and 10 original costs 4 + 4 = 8 replacement cost 8 deferring deletion of insn with uid = 9. modifying insn i3 10: r100:DI=sign_extend(r93:SI)<<0x2+r96:DI REG_DEAD r93:SI deferring rescan insn with uid = 10. where (shift (extend …) …) is IMO the correct representation of this operation. The new pattern comes from this code in make_extraction: else if (GET_CODE (inner) == ASHIFT && CONST_INT_P (XEXP (inner, 1)) && pos_rtx == 0 && pos == 0 && len > UINTVAL (XEXP (inner, 1))) { /* We're extracting the least significant bits of an rtx (ashift X (const_int C)), where LEN > C. Extract the least significant (LEN - C) bits of X, giving an rtx whose mode is MODE, then shift it left C times. */ new_rtx = make_extraction (mode, XEXP (inner, 0), 0, 0, len - INTVAL (XEXP (inner, 1)), unsignedp, in_dest, in_compare); if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } But because of the unfortunate special case that shifts need to be MULTs inside MEMs, make_compound_operation_int has: case ASHIFT: /* Convert shifts by constants into multiplications if inside an address. */ if (in_code == MEM && CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) < HOST_BITS_PER_WIDE_INT && INTVAL (XEXP (x, 1)) >= 0) { HOST_WIDE_INT count = INTVAL (XEXP (x, 1)); HOST_WIDE_INT multval = HOST_WIDE_INT_1 << count; new_rtx = make_compound_operation (XEXP (x, 0), next_code); if (GET_CODE (new_rtx) == NEG) { new_rtx = XEXP (new_rtx, 0); multval = -multval; } multval = trunc_int_for_mode (multval, mode); new_rtx = gen_rtx_MULT (mode, new_rtx, gen_int_mode (multval, mode)); } break; Thus for: case ASHIFTRT: lhs = XEXP (x, 0); rhs = XEXP (x, 1); /* If we have (ashiftrt (ashift foo C1) C2) with C2 >= C1, this is a SIGN_EXTRACT. */ if (CONST_INT_P (rhs) && GET_CODE (lhs) == ASHIFT && CONST_INT_P (XEXP (lhs, 1)) && INTVAL (rhs) >= INTVAL (XEXP (lhs, 1)) && INTVAL (XEXP (lhs, 1)) >= 0 && INTVAL (rhs) < mode_width) { new_rtx = make_compound_operation (XEXP (lhs, 0), next_code); new_rtx = make_extraction (mode, new_rtx, INTVAL (rhs) - INTVAL (XEXP (lhs, 1)), NULL_RTX, mode_width - INTVAL (rhs), code == LSHIFTRT, 0, in_code == COMPARE); break; } the first new_rtx is a MULT rather than an ASHIFT when the operation occurs inside a MEM. make_extraction then doesn't do the same canonicalisation as it does for ASHIFT above. So for the original testcase we get: Trying 8 -> 9: 8: r98:DI=sign_extend(r92:SI) REG_DEAD r92:SI 9: [r98:DI*0x4+r96:DI]=asm_operands REG_DEAD r98:DI Failed to match this instruction: (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) (const_int 4 [0x4])) (const_int 34 [0x22]) (const_int 0 [0])) (reg/f:DI 96)) [3 *i_5+0 S4 A32]) (asm_operands:SI ("") ("=Q") 0 [] [] [] /tmp/foo.c:13)) allowing combination of insns 8 and 9 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 8. modifying insn i3 9: [sign_extract(r92:SI#0*0x4,0x22,0)+r96:DI]=asm_operands REG_DEAD r92:SI deferring rescan insn with uid = 9. starting the processing of deferred insns rescanning insn with uid = 9. ending the processing of deferred insns So IMO make_extraction should recognise and handle the “mem form” of the shift too. It's all a bit unfortunate really. Having different representations for shifts inside and outside MEMs is the Second Great RTL Mistake. (The first was modeless const_ints. :-)) If we do that, we should be able to remove the handling of extract-based addresses in aarch64_classify_index & co. Thanks, Richard > > Testing: > * Bootstrap and regtest on aarch64-none-linux-gnu, no regressions. > * Checked new unit test passes on arm-none-linux-gnueabihf. > * Checked new unit test isn't run on x86 (inline asm uses > arm/aarch64-specific constraint). > * Tested linux-next tree no longer encounters this ICE on arm64 with > patched GCC (unfortunately still doesn't compile successfully due to > a fix for PR96475 which introduces an ICE on arm64). > > OK for trunk? > > Thanks, > Alex > > --- > > gcc/ChangeLog: > > PR target/96998 > * config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Update to > work for shift pattern instead of mult. > (aarch64_strip_extend): Cost extract+shift pattern instead of > now-removed extract+mult pattern. > (aarch64_rtx_arith_op_extract_p): Likewise. > * config/aarch64/aarch64.md (*add_<optab><mode>_shftex): New. > > gcc/testsuite/ChangeLog: > > PR target/96998 > * gcc.c-torture/compile/pr96998.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index b6d74496cd0..55e0fc4e683 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -2815,27 +2815,24 @@ aarch64_is_noplt_call_p (rtx sym) > represent an expression that matches an extend operation. The > operands represent the parameters from > > - (extract:MODE (mult (reg) (MULT_IMM)) (EXTRACT_IMM) (const_int 0)). */ > + (extract:MODE (ashift (reg) (SHIFT_IMM)) (EXTRACT_IMM) (const_int 0)). */ > bool > -aarch64_is_extend_from_extract (scalar_int_mode mode, rtx mult_imm, > +aarch64_is_extend_from_extract (scalar_int_mode mode, rtx shift_imm, > rtx extract_imm) > { > - HOST_WIDE_INT mult_val, extract_val; > + HOST_WIDE_INT shift_val, extract_val; > > - if (! CONST_INT_P (mult_imm) || ! CONST_INT_P (extract_imm)) > + if (! CONST_INT_P (shift_imm) || ! CONST_INT_P (extract_imm)) > return false; > > - mult_val = INTVAL (mult_imm); > + shift_val = INTVAL (shift_imm); > extract_val = INTVAL (extract_imm); > > - if (extract_val > 8 > - && extract_val < GET_MODE_BITSIZE (mode) > - && exact_log2 (extract_val & ~7) > 0 > - && (extract_val & 7) <= 4 > - && mult_val == (1 << (extract_val & 7))) > - return true; > - > - return false; > + return extract_val > 8 > + && extract_val < GET_MODE_BITSIZE (mode) > + && exact_log2 (extract_val & ~7) > 0 > + && shift_val <= 4 > + && shift_val == (extract_val & 7); > } > > /* Emit an insn that's a simple single-set. Both the operands must be > @@ -11262,7 +11259,7 @@ aarch64_strip_extend (rtx x, bool strip_shift) > /* Zero and sign extraction of a widened value. */ > if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT) > && XEXP (op, 2) == const0_rtx > - && GET_CODE (XEXP (op, 0)) == MULT > + && GET_CODE (XEXP (op, 0)) == ASHIFT > && aarch64_is_extend_from_extract (mode, XEXP (XEXP (op, 0), 1), > XEXP (op, 1))) > return XEXP (XEXP (op, 0), 0); > @@ -11617,7 +11614,7 @@ aarch64_rtx_arith_op_extract_p (rtx x, > scalar_int_mode mode) > rtx op1 = XEXP (x, 1); > rtx op2 = XEXP (x, 2); > > - if (GET_CODE (op0) == MULT > + if (GET_CODE (op0) == ASHIFT > && CONST_INT_P (op1) > && op2 == const0_rtx > && CONST_INT_P (XEXP (op0, 1)) > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index dbc6b1db176..4bb7b318b99 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -2471,6 +2471,19 @@ (define_insn "*add_<optab><ALLX:mode>_<GPI:mode>" > [(set_attr "type" "alu_ext")] > ) > > +(define_insn "*add_<optab><mode>_shftex" > + [(set (match_operand:GPI 0 "register_operand" "=rk") > + (plus:GPI (ANY_EXTRACT:GPI > + (ashift:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand 2 "aarch64_shift_imm_<mode>" > "n")) > + (match_operand 3 "const_int_operand" "n") > + (const_int 0)) > + (match_operand:GPI 4 "register_operand" "r")))] > + "aarch64_is_extend_from_extract (<MODE>mode, operands[2], operands[3])" > + "add\\t%<w>0, %<w>4, %<w>1, <su>xt%e3 %2" > + [(set_attr "type" "alu_ext")] > +) > + > ;; zero_extend version of above > (define_insn "*add_<optab><SHORT:mode>_si_uxtw" > [(set (match_operand:DI 0 "register_operand" "=rk") > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr96998.c > b/gcc/testsuite/gcc.c-torture/compile/pr96998.c > new file mode 100644 > index 00000000000..a75d5dcfe08 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/compile/pr96998.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile { target arm*-*-* aarch64*-*-* } } */ > + > +int h(void); > +struct c d; > +struct c { > + int e[1]; > +}; > + > +void f(void) { > + int g; > + for (;; g = h()) { > + int *i = &d.e[g]; > + asm("" : "=Q"(*i)); > + } > +}