On Fri, Sep 15, 2017 at 09:15:17AM +0100, Michael Collison wrote: > Patch updated with comments to simplify pattern .from Richard Sandiford. Okay > for trunk?
OK. Thanks, James > > 2017-09-14 Michael Collison <michael.colli...@arm.com> > > * config/aarch64/aarch64.md (*aarch64_reg_<optab>_minus<mode>3): > New pattern. > > 2017-09-14 Michael Collison <michael.colli...@arm.com> > > * gcc.target/aarch64/var_shift_mask_2.c: New test. > > Richard Sandiford <richard.sandif...@linaro.org> writes: > > Richard Sandiford <richard.sandif...@linaro.org> writes: > >> Michael Collison <michael.colli...@arm.com> writes: > >>> Richard, > >>> > >>> The problem with this approach for Aarch64 is that > >>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which > >>> is normally 0 as it based on the TARGET_SIMD flag. > >> > >> Maybe I'm wrong, but that seems like a missed optimisation in itself. > > Sorry to follow up on myself yet again, but I'd forgotten this was because we > allow the SIMD unit to do scalar shifts. So I guess we have no choice, even > though it seems unfortunate. > > > +(define_insn_and_split "*aarch64_reg_<optab>_minus<mode>3" > > + [(set (match_operand:GPI 0 "register_operand" "=&r") > > + (ASHIFT:GPI > > + (match_operand:GPI 1 "register_operand" "r") > > + (minus:QI (match_operand 2 "const_int_operand" "n") > > + (match_operand:QI 3 "register_operand" "r"))))] > > + "INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)" > > + "#" > > + "&& true" > > + [(const_int 0)] > > + { > > + /* Handle cases where operand 3 is a plain QI register, or > > + a subreg with either a SImode or DImode register. */ > > + > > + rtx subreg_tmp = (REG_P (operands[3]) > > + ? gen_lowpart_SUBREG (SImode, operands[3]) > > + : SUBREG_REG (operands[3])); > > + > > + if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode) > > + subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp); > > I think this all simplifies to: > > rtx subreg_tmp = gen_lowpart (SImode, operands[3]); > > (or it would be worth having a comment that explains why not). > As well as being shorter, it will properly simplify hard REGs to new hard > REGs. > > > + rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) > > + : operands[0]); > > + > > + if (<MODE>mode == DImode && !can_create_pseudo_p ()) > > + tmp = gen_lowpart_SUBREG (SImode, operands[0]); > > I think this too would be simpler with gen_lowpart: > > rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) > : gen_lowpart (SImode, operands[0])); > > > + > > + emit_insn (gen_negsi2 (tmp, subreg_tmp)); > > + > > + rtx and_op = gen_rtx_AND (SImode, tmp, > > + GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1)); > > + > > + rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op); > > + > > + emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp2)); > > + DONE; > > + } > > +) > > The pattern should probably set the "length" attribute to 8. > > Looks good to me with those changes FWIW. > > Thanks, > Richard