On Fri, Sep 15, 2017 at 09:15:17AM +0100, Michael Collison wrote:
> Patch updated with comments to simplify pattern .from Richard Sandiford. Okay 
> for trunk?

OK.

Thanks,
James

> 
> 2017-09-14  Michael Collison <michael.colli...@arm.com>
> 
>       * config/aarch64/aarch64.md (*aarch64_reg_<optab>_minus<mode>3):
>       New pattern.
> 
> 2017-09-14  Michael Collison <michael.colli...@arm.com>
> 
>       * gcc.target/aarch64/var_shift_mask_2.c: New test.
> 
> Richard Sandiford <richard.sandif...@linaro.org> writes:
> > Richard Sandiford <richard.sandif...@linaro.org> writes:
> >> Michael Collison <michael.colli...@arm.com> writes:
> >>> Richard,
> >>>
> >>> The problem with this approach for Aarch64 is that 
> >>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which 
> >>> is normally 0 as it based on the TARGET_SIMD flag.
> >>
> >> Maybe I'm wrong, but that seems like a missed optimisation in itself.
> 
> Sorry to follow up on myself yet again, but I'd forgotten this was because we 
> allow the SIMD unit to do scalar shifts.  So I guess we have no choice, even 
> though it seems unfortunate.
> 
> > +(define_insn_and_split "*aarch64_reg_<optab>_minus<mode>3"
> > +  [(set (match_operand:GPI 0 "register_operand" "=&r")
> > +   (ASHIFT:GPI
> > +     (match_operand:GPI 1 "register_operand" "r")
> > +     (minus:QI (match_operand 2 "const_int_operand" "n")
> > +               (match_operand:QI 3 "register_operand" "r"))))]
> > +  "INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)"
> > +  "#"
> > +  "&& true"
> > +  [(const_int 0)]
> > +  {
> > +    /* Handle cases where operand 3 is a plain QI register, or
> > +       a subreg with either a SImode or DImode register.  */
> > +
> > +    rtx subreg_tmp = (REG_P (operands[3])
> > +                 ? gen_lowpart_SUBREG (SImode, operands[3])
> > +                 : SUBREG_REG (operands[3]));
> > +
> > +    if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode)
> > +      subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp);
> 
> I think this all simplifies to:
> 
>   rtx subreg_tmp = gen_lowpart (SImode, operands[3]);
> 
> (or it would be worth having a comment that explains why not).
> As well as being shorter, it will properly simplify hard REGs to new hard 
> REGs.
> 
> > +    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> > +          : operands[0]);
> > +
> > +    if (<MODE>mode == DImode && !can_create_pseudo_p ())
> > +      tmp = gen_lowpart_SUBREG (SImode, operands[0]);
> 
> I think this too would be simpler with gen_lowpart:
> 
>     rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
>              : gen_lowpart (SImode, operands[0]));
> 
> > +
> > +    emit_insn (gen_negsi2 (tmp, subreg_tmp));
> > +
> > +    rtx and_op = gen_rtx_AND (SImode, tmp,
> > +                         GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1));
> > +
> > +    rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op);
> > +
> > +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp2));
> > +    DONE;
> > +  }
> > +)
> 
> The pattern should probably set the "length" attribute to 8.
> 
> Looks good to me with those changes FWIW.
> 
> Thanks,
> Richard


Reply via email to