Michael Collison <michael.colli...@arm.com> writes: > Richard, > > The problem with this approach for Aarch64 is that > TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is > normally 0 as it based on the TARGET_SIMD flag.
Maybe I'm wrong, but that seems like a missed optimisation in itself. Like you say, the definition is: static unsigned HOST_WIDE_INT aarch64_shift_truncation_mask (machine_mode mode) { return (!SHIFT_COUNT_TRUNCATED || aarch64_vector_mode_supported_p (mode) || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1); } SHIFT_COUNT_TRUNCATED is: #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD) and aarch64_vector_mode_supported_p always returns false for !TARGET_SIMD: static bool aarch64_vector_mode_supported_p (machine_mode mode) { if (TARGET_SIMD && (mode == V4SImode || mode == V8HImode || mode == V16QImode || mode == V2DImode || mode == V2SImode || mode == V4HImode || mode == V8QImode || mode == V2SFmode || mode == V4SFmode || mode == V2DFmode || mode == V4HFmode || mode == V8HFmode || mode == V1DFmode)) return true; return false; } So when does the second || condition fire? I'm surprised the aarch64_vect_struct_mode_p part is needed, since this hook describes the shift optabs, and AArch64 don't provide any shift optabs for OI, CI or XI. Thanks, Richard > -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Wednesday, September 6, 2017 11:32 AM > To: Michael Collison <michael.colli...@arm.com> > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Kenner > <ken...@vlsi1.ultra.nyu.edu>; GCC Patches <gcc-patches@gcc.gnu.org>; nd > <n...@arm.com>; Andrew Pinski <pins...@gmail.com> > Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts > > Michael Collison <michael.colli...@arm.com> writes: >> Richard Sandiford do you have any objections to the patch as it stands? >> It doesn't appear as if anything is going to change in the mid-end >> anytime soon. > > I think one of the suggestions was to do it in expand, taking advantage of > range info and TARGET_SHIFT_TRUNCATION_MASK. This would be like the current > FMA_EXPR handling in expand_expr_real_2. > > I know there was talk about cleaner approaches, but at least doing the above > seems cleaner than doing in the backend. It should also be a > nicely-contained piece of work. > > Thanks, > Richard > >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] >> Sent: Tuesday, August 22, 2017 9:11 AM >> To: Richard Biener <richard.guent...@gmail.com> >> Cc: Richard Kenner <ken...@vlsi1.ultra.nyu.edu>; Michael Collison >> <michael.colli...@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd >> <n...@arm.com>; Andrew Pinski <pins...@gmail.com> >> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >> >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford >>>>> <richard.sandif...@linaro.org> wrote: >>>>>>Richard Biener <richard.guent...@gmail.com> writes: >>>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >>>>>>> <ken...@vlsi1.ultra.nyu.edu> wrote: >>>>>>>>> Correct. It is truncated for integer shift, but not simd shift >>>>>>>>> instructions. We generate a pattern in the split that only >>>>>>generates >>>>>>>>> the integer shift instructions. >>>>>>>> >>>>>>>> That's unfortunate, because it would be nice to do this in >>>>>>simplify_rtx, >>>>>>>> since it's machine-independent, but that has to be conditioned >>>>>>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >>>>>>> >>>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in >>>>>>> the patterns, like for example with >>>>>>> >>>>>>> (define_insn ashlSI3 >>>>>>> [(set (match_operand 0 "") >>>>>>> (ashl:SI (match_operand ... ) >>>>>>> (subreg:QI (match_operand:SI ...)))] >>>>>>> >>>>>>> or an explicit and:SI and combine / simplify_rtx should apply the >>>>>>magic >>>>>>> optimization we expect. >>>>>> >>>>>>The problem with the explicit AND is that you'd end up with either >>>>>>an AND of two constants for constant shifts, or with two separate >>>>>>patterns, one for constant shifts and one for variable shifts. >>>>>>(And the problem in theory with two patterns is that it reduces the >>>>>>RA's freedom, although in practice I guess we'd always want a >>>>>>constant shift where possible for cost reasons, and so the RA would >>>>>>never need to replace pseudos with constants itself.) >>>>>> >>>>>>I think all useful instances of this optimisation will be exposed >>>>>>by the gimple optimisers, so maybe expand could to do it based on >>>>>>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >>>>>>the rtx code and it does take the mode into account. >>>>> >>>>> Sure, that could work as well and also take into account range info. >>>>> But we'd then need named expanders and the result would still have >>>>> the explicit and or need to be an unspec or a different RTL operation. >>>> >>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >>>> target-dependent rather than undefined behaviour, so it's OK for a >>>> target to use shift codes with out-of-range values. >>> >>> Hmm, but that means simplify-rtx can't do anything with them because >>> we need to preserve target dependent behavior. >> >> Yeah, it needs to punt. In practice that shouldn't matter much. >> >>> I think the RTL IL should be always well-defined and its semantics >>> shouldn't have any target dependences (ideally, and if, then they >>> should be well specified via extra target hooks/macros). >> >> That would be nice :-) I think the problem has traditionally been that >>> shifts can be used in quite a few define_insn patterns besides those >>> for shift instructions. So if your target defines shifts to have >>> 256-bit precision (say) then you need to make sure that every >>> define_insn with a shift rtx will honour that. >> >> It's more natural for target guarantees to apply to instructions than >> to >>> rtx codes. >> >>>> And >>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about >>>> how the normal shift optabs behave, so I don't think we'd need new >>>> optabs or new unspecs. >>>> >>>> E.g. it already works this way when expanding double-word shifts, >>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There >>>> it's possible to use a shorter sequence if you know that the shift >>>> optab truncates the count, so we can do that even if >>>> SHIFT_COUNT_TRUNCATED isn't defined. >>> >>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK >>> applies to the instructions generated by the named shift patterns but >>> _not_ general shift RTXen. But the generated pattern contains shift >>> RTXen and how can we figure whether they were generated by the named >>> expanders or by other means? Don't define_expand also serve as >>> define_insn for things like combine? >> >> Yeah, you can't (and aren't supposed to try to) reverse-engineer the >>> expander from the generated instructions. >>> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding >>> a shift optab. >> >>> That said, from looking at the docs and your description above it >>> seems that semantics are not fully reflected in the RTL instruction stream? >> >> Yeah, the semantics go from being well-defined in the optab interface >> to being target-dependent in the rtl stream. >> >> Thanks, >> Richard >> >>> >>> Richard. >>> >>>> Thanks, >>>> Richard >>>> >>>>> >>>>> Richard. >>>>> >>>>>>Thanks, >>>>>>Richard