Jonathan Wright <jonathan.wri...@arm.com> writes: > Hi, > > The Neon multiply/multiply-accumulate/multiply-subtract instructions > can take various forms - multiplying full vector registers of values > or multiplying one vector by a single element of another. Regardless > of the form used, these instructions have the same cost, and this > should be reflected by the RTL cost function. > > This patch adds RTL tree traversal in the Neon multiply cost function > to match the vec_select used by the lane-referencing forms of the > instructions already mentioned. This traversal prevents the cost of > the vec_select from being added into the cost of the multiply - > meaning that these instructions can now be emitted in the combine > pass as they are no longer deemed prohibitively expensive. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-19 Jonathan Wright <jonathan.wri...@arm.com> > > * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse > RTL tree to prevents vec_select from being added into Neon > multiply cost. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > f5b25a7f7041645921e6ad85714efda73b993492..b368303b0e699229266e6d008e28179c496bf8cd > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -11985,6 +11985,21 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, > int outer, bool speed) > op0 = XEXP (op0, 0); > else if (GET_CODE (op1) == VEC_DUPLICATE) > op1 = XEXP (op1, 0); > + /* The same argument applies to the VEC_SELECT when using the lane- > + referencing forms of the MUL/MLA/MLS instructions. Without the > + traversal here, the combine pass deems these patterns too > + expensive and subsequently does not emit the lane-referencing > + forms of the instructions. In addition, canonical form is for the > + VEC_SELECT to be the second argument of the multiply - thus only > + op1 is traversed. */ > + if (GET_CODE (op1) == VEC_SELECT > + && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1) > + op1 = XEXP (op1, 0); > + else if ((GET_CODE (op1) == ZERO_EXTEND > + || GET_CODE (op1) == SIGN_EXTEND) > + && GET_CODE (XEXP (op1, 0)) == VEC_SELECT > + && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1) > + op1 = XEXP (XEXP (op1, 0), 0);
I think this logically belongs in the “GET_CODE (op1) == VEC_DUPLICATE” if block, since the condition is never true otherwise. We can probably skip the GET_MODE_NUNITS tests, but if you'd prefer to keep them, I think it would be better to add them to the existing VEC_DUPLICATE tests rather than restrict them to the VEC_SELECT ones. Also, although this is in Advanced SIMD-specific code, I think it'd be better to use: is_a<scalar_mode> (GET_MODE (op1)) instead of: GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1 Do you have a testcase? Thanks, Richard