Michael Collison <michael.colli...@arm.com> writes: > Richard Sandiford do you have any objections to the patch as it stands? > It doesn't appear as if anything is going to change in the mid-end > anytime soon.
I think one of the suggestions was to do it in expand, taking advantage of range info and TARGET_SHIFT_TRUNCATION_MASK. This would be like the current FMA_EXPR handling in expand_expr_real_2. I know there was talk about cleaner approaches, but at least doing the above seems cleaner than doing in the backend. It should also be a nicely-contained piece of work. Thanks, Richard > -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Tuesday, August 22, 2017 9:11 AM > To: Richard Biener <richard.guent...@gmail.com> > Cc: Richard Kenner <ken...@vlsi1.ultra.nyu.edu>; Michael Collison > <michael.colli...@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd > <n...@arm.com>; Andrew Pinski <pins...@gmail.com> > Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts > > Richard Biener <richard.guent...@gmail.com> writes: >> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>>Richard Biener <richard.guent...@gmail.com> writes: >>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >>>>>> <ken...@vlsi1.ultra.nyu.edu> wrote: >>>>>>>> Correct. It is truncated for integer shift, but not simd shift >>>>>>>> instructions. We generate a pattern in the split that only >>>>>generates >>>>>>>> the integer shift instructions. >>>>>>> >>>>>>> That's unfortunate, because it would be nice to do this in >>>>>simplify_rtx, >>>>>>> since it's machine-independent, but that has to be conditioned on >>>>>>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >>>>>> >>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in the >>>>>> patterns, like for example with >>>>>> >>>>>> (define_insn ashlSI3 >>>>>> [(set (match_operand 0 "") >>>>>> (ashl:SI (match_operand ... ) >>>>>> (subreg:QI (match_operand:SI ...)))] >>>>>> >>>>>> or an explicit and:SI and combine / simplify_rtx should apply the >>>>>magic >>>>>> optimization we expect. >>>>> >>>>>The problem with the explicit AND is that you'd end up with either >>>>>an AND of two constants for constant shifts, or with two separate >>>>>patterns, one for constant shifts and one for variable shifts. (And >>>>>the problem in theory with two patterns is that it reduces the RA's >>>>>freedom, although in practice I guess we'd always want a constant >>>>>shift where possible for cost reasons, and so the RA would never >>>>>need to replace pseudos with constants itself.) >>>>> >>>>>I think all useful instances of this optimisation will be exposed by >>>>>the gimple optimisers, so maybe expand could to do it based on >>>>>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >>>>>the rtx code and it does take the mode into account. >>>> >>>> Sure, that could work as well and also take into account range info. >>>> But we'd then need named expanders and the result would still have >>>> the explicit and or need to be an unspec or a different RTL operation. >>> >>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >>> target-dependent rather than undefined behaviour, so it's OK for a >>> target to use shift codes with out-of-range values. >> >> Hmm, but that means simplify-rtx can't do anything with them because >> we need to preserve target dependent behavior. > > Yeah, it needs to punt. In practice that shouldn't matter much. > >> I think the RTL IL should be always well-defined and its semantics >> shouldn't have any target dependences (ideally, and if, then they >> should be well specified via extra target hooks/macros). > > That would be nice :-) I think the problem has traditionally been that >> shifts can be used in quite a few define_insn patterns besides those >> for shift instructions. So if your target defines shifts to have >> 256-bit precision (say) then you need to make sure that every >> define_insn with a shift rtx will honour that. > > It's more natural for target guarantees to apply to instructions than to >> rtx codes. > >>> And >>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how >>> the normal shift optabs behave, so I don't think we'd need new optabs >>> or new unspecs. >>> >>> E.g. it already works this way when expanding double-word shifts, >>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There it's >>> possible to use a shorter sequence if you know that the shift optab >>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED >>> isn't defined. >> >> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK >> applies to the instructions generated by the named shift patterns but >> _not_ general shift RTXen. But the generated pattern contains shift >> RTXen and how can we figure whether they were generated by the named >> expanders or by other means? Don't define_expand also serve as >> define_insn for things like combine? > > Yeah, you can't (and aren't supposed to try to) reverse-engineer the >> expander from the generated instructions. >> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding a >> shift optab. > >> That said, from looking at the docs and your description above it >> seems that semantics are not fully reflected in the RTL instruction stream? > > Yeah, the semantics go from being well-defined in the optab interface to > being target-dependent in the rtl stream. > > Thanks, > Richard > >> >> Richard. >> >>> Thanks, >>> Richard >>> >>>> >>>> Richard. >>>> >>>>>Thanks, >>>>>Richard