Jonathan Wright <jonathan.wri...@arm.com> writes: > Hi, > > As a general principle, vec_duplicate should be as close to the root > of an expression as possible. Where unary operations have > vec_duplicate as an argument, these operations should be pushed > inside the vec_duplicate. > > This patch modifies unary operation simplification to push > sign/zero-extension of a scalar inside vec_duplicate. > > This patch also updates all RTL patterns in aarch64-simd.md to use > the new canonical form. > > Regression tested and bootstrapped on aarch64-none-linux-gnu and > x86_64-none-linux-gnu - no issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-19 Jonathan Wright <jonathan.wri...@arm.com> > > * config/aarch64/aarch64-simd.md: Push sign/zero-extension > inside vec_duplicate for all patterns. > * simplify-rtx.c (simplify_context::simplify_unary_operation_1): > Push sign/zero-extension inside vec_duplicate. > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 74890989cb3045798bf8d0241467eaaf72238297..99a95a54248041906b9a0ad742d3a0dca9733b35 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2092,14 +2092,14 @@ > > (define_insn "aarch64_<su>mlal_hi_n<mode>_insn" > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > - (plus:<VWIDE> > - (mult:<VWIDE> > - (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > - (match_operand:VQ_HSI 2 "register_operand" "w") > - (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > - (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND> > - (match_operand:<VEL> 4 "register_operand" "<h_con>")))) > - (match_operand:<VWIDE> 1 "register_operand" "0")))] > + (plus:<VWIDE> > + (mult:<VWIDE> > + (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> > + (match_operand:VQ_HSI 2 "register_operand" "w") > + (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) > + (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S> > + (match_operand:<VEL> 4 "register_operand" "<h_con>")))) > + (match_operand:<VWIDE> 1 "register_operand" "0")))]
Sorry to nitpick, since this is pre-existing, but I think the pattern would be easier to read with one operation per line. I.e.: (plus:<VWIDE> (mult:<VWIDE> (ANY_EXTEND:<VWIDE> (vec_select:<VHALF> (match_operand:VQ_HSI 2 "register_operand" "w") (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S> (match_operand:<VEL> 4 "register_operand" "<h_con>")))) (match_operand:<VWIDE> 1 "register_operand" "0")))] Same for the other patterns with similar doubling of operators. (It looks like you've fixed other indentation problems though, thanks.) > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index > 2d169d3f9f70c85d396adaed124b6c52aca98f07..f885816412f7576d2535f827562d2b425a6a553b > 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -903,6 +903,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code > code, machine_mode mode, > rtx temp, elt, base, step; > scalar_int_mode inner, int_mode, op_mode, op0_mode; > > + /* Extending a VEC_DUPLICATE of a scalar should be canonicalized to a > + VEC_DUPLICATE of an extended scalar. This is outside of the main switch > + as we may wish to push all unary operations inside VEC_DUPLICATE. */ > + if ((code == SIGN_EXTEND || code == ZERO_EXTEND) > + && GET_CODE (op) == VEC_DUPLICATE > + && GET_MODE_NUNITS (GET_MODE (XEXP (op, 0))).to_constant () == 1) > + { > + rtx x = simplify_gen_unary (code, GET_MODE_INNER (mode), > + XEXP (op, 0), GET_MODE (XEXP (op, 0))); > + return gen_vec_duplicate (mode, x); > + } > + > switch (code) > { > case NOT: This is really an extension of the existing code: if (VECTOR_MODE_P (mode) && vec_duplicate_p (op, &elt) && code != VEC_DUPLICATE) { /* Try applying the operator to ELT and see if that simplifies. We can duplicate the result if so. The reason we don't use simplify_gen_unary is that it isn't necessarily a win to convert things like: (neg:V (vec_duplicate:V (reg:S R))) to: (vec_duplicate:V (neg:S (reg:S R))) The first might be done entirely in vector registers while the second might need a move between register files. */ temp = simplify_unary_operation (code, GET_MODE_INNER (mode), elt, GET_MODE_INNER (GET_MODE (op))); if (temp) return gen_vec_duplicate (mode, temp); } As we discussed off-list, the original motivation of this code was to support folding operations on variable-length vectors. We didn't want to change the canonical form for everyone at the same time. Now that we've agreed what the canonical form should be (on IRC), the long-term goal should be change the simplify_unary_operation into a simplify_gen_unary. I understand you might not want to do that yet, but I think we should at least integrate the new code with the existing code. E.g.: if (code == SIGN_EXTEND || code == ZERO_EXTEND) temp = simplify_gen_unary (code, GET_MODE_INNER (mode), elt, GET_MODE_INNER (GET_MODE (op))); else temp = simplify_unary_operation (code, GET_MODE_INNER (mode), elt, GET_MODE_INNER (GET_MODE (op))); Thanks, Richard