RE: [PATCH] [Aarch64] Optimize subtract in shift counts

Michael Collison Wed, 06 Sep 2017 02:12:26 -0700

Richard Sandiford do you have any objections to the patch as it stands? It 
doesn't appear as if anything is going to change in the mid-end anytime soon.


-----Original Message-----
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Tuesday, August 22, 2017 9:11 AM
To: Richard Biener <richard.guent...@gmail.com>
Cc: Richard Kenner <ken...@vlsi1.ultra.nyu.edu>; Michael Collison 
<michael.colli...@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd 
<n...@arm.com>; Andrew Pinski <pins...@gmail.com>
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
> <richard.sandif...@linaro.org> wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>> <richard.sandif...@linaro.org> wrote:
>>>>Richard Biener <richard.guent...@gmail.com> writes:
>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>>>> <ken...@vlsi1.ultra.nyu.edu> wrote:
>>>>>>> Correct. It is truncated for integer shift, but not simd shift 
>>>>>>> instructions. We generate a pattern in the split that only
>>>>generates
>>>>>>> the integer shift instructions.
>>>>>>
>>>>>> That's unfortunate, because it would be nice to do this in
>>>>simplify_rtx,
>>>>>> since it's machine-independent, but that has to be conditioned on 
>>>>>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>>>
>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in the 
>>>>> patterns, like for example with
>>>>>
>>>>> (define_insn ashlSI3
>>>>>   [(set (match_operand 0 "")
>>>>>          (ashl:SI (match_operand ... )
>>>>>                      (subreg:QI (match_operand:SI ...)))]
>>>>>
>>>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>>>magic
>>>>> optimization we expect.
>>>>
>>>>The problem with the explicit AND is that you'd end up with either 
>>>>an AND of two constants for constant shifts, or with two separate 
>>>>patterns, one for constant shifts and one for variable shifts.  (And 
>>>>the problem in theory with two patterns is that it reduces the RA's 
>>>>freedom, although in practice I guess we'd always want a constant 
>>>>shift where possible for cost reasons, and so the RA would never 
>>>>need to replace pseudos with constants itself.)
>>>>
>>>>I think all useful instances of this optimisation will be exposed by 
>>>>the gimple optimisers, so maybe expand could to do it based on 
>>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>>>>the rtx code and it does take the mode into account.
>>>
>>> Sure, that could work as well and also take into account range info. 
>>> But we'd then need named expanders and the result would still have 
>>> the explicit and or need to be an unspec or a different RTL operation.
>>
>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>> target-dependent rather than undefined behaviour, so it's OK for a 
>> target to use shift codes with out-of-range values.
>
> Hmm, but that means simplify-rtx can't do anything with them because 
> we need to preserve target dependent behavior.

Yeah, it needs to punt.  In practice that shouldn't matter much.

> I think the RTL IL should be always well-defined and its semantics 
> shouldn't have any target dependences (ideally, and if, then they 
> should be well specified via extra target hooks/macros).

That would be nice :-)  I think the problem has traditionally been that shifts 
can be used in quite a few define_insn patterns besides those for shift 
instructions.  So if your target defines shifts to have 256-bit precision (say) 
then you need to make sure that every define_insn with a shift rtx will honour 
that.

It's more natural for target guarantees to apply to instructions than to rtx 
codes.

>>  And
>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how 
>> the normal shift optabs behave, so I don't think we'd need new optabs 
>> or new unspecs.
>>
>> E.g. it already works this way when expanding double-word shifts, 
>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's 
>> possible to use a shorter sequence if you know that the shift optab 
>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED 
>> isn't defined.
>
> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
> applies to the instructions generated by the named shift patterns but 
> _not_ general shift RTXen.  But the generated pattern contains shift 
> RTXen and how can we figure whether they were generated by the named 
> expanders or by other means?  Don't define_expand also serve as 
> define_insn for things like combine?

Yeah, you can't (and aren't supposed to try to) reverse-engineer the expander 
from the generated instructions.  TARGET_SHIFT_TRUNCATION_MASK should only be 
used if you're expanding a shift optab.

> That said, from looking at the docs and your description above it 
> seems that semantics are not fully reflected in the RTL instruction stream?

Yeah, the semantics go from being well-defined in the optab interface to being 
target-dependent in the rtl stream.

Thanks,
Richard

>
> Richard.
>
>> Thanks,
>> Richard
>>
>>>
>>> Richard.
>>>
>>>>Thanks,
>>>>Richard

RE: [PATCH] [Aarch64] Optimize subtract in shift counts

Reply via email to