Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Richard Sandiford Tue, 22 Aug 2017 01:12:09 -0700

Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford
> <richard.sandif...@linaro.org> wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford
>>> <richard.sandif...@linaro.org> wrote:
>>>>Richard Biener <richard.guent...@gmail.com> writes:
>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner
>>>>> <ken...@vlsi1.ultra.nyu.edu> wrote:
>>>>>>> Correct. It is truncated for integer shift, but not simd shift
>>>>>>> instructions. We generate a pattern in the split that only
>>>>generates
>>>>>>> the integer shift instructions.
>>>>>>
>>>>>> That's unfortunate, because it would be nice to do this in
>>>>simplify_rtx,
>>>>>> since it's machine-independent, but that has to be conditioned on
>>>>>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>>>
>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in
>>>>> the patterns, like for example with
>>>>>
>>>>> (define_insn ashlSI3
>>>>>   [(set (match_operand 0 "")
>>>>>          (ashl:SI (match_operand ... )
>>>>>                      (subreg:QI (match_operand:SI ...)))]
>>>>>
>>>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>>>magic
>>>>> optimization we expect.
>>>>
>>>>The problem with the explicit AND is that you'd end up with either
>>>>an AND of two constants for constant shifts, or with two separate
>>>>patterns,
>>>>one for constant shifts and one for variable shifts.  (And the problem
>>>>in
>>>>theory with two patterns is that it reduces the RA's freedom, although
>>>>in
>>>>practice I guess we'd always want a constant shift where possible for
>>>>cost reasons, and so the RA would never need to replace pseudos with
>>>>constants itself.)
>>>>
>>>>I think all useful instances of this optimisation will be exposed by
>>>>the gimple optimisers, so maybe expand could to do it based on
>>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than
>>>>the rtx code and it does take the mode into account.
>>>
>>> Sure, that could work as well and also take into account range info. But
>>> we'd then need named expanders and the result would still have the
>>> explicit and or need to be an unspec or a different RTL operation.
>>
>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have
>> target-dependent rather than undefined behaviour, so it's OK
>> for a target to use shift codes with out-of-range values.
>
> Hmm, but that means simplify-rtx can't do anything with them because
> we need to preserve target dependent behavior.


Yeah, it needs to punt.  In practice that shouldn't matter much.

> I think the RTL IL should be always well-defined and its semantics
> shouldn't have any target dependences (ideally, and if, then they
> should be well specified via extra target hooks/macros).

That would be nice :-)  I think the problem has traditionally been
that shifts can be used in quite a few define_insn patterns besides
those for shift instructions.  So if your target defines shifts to have
256-bit precision (say) then you need to make sure that every define_insn
with a shift rtx will honour that.

It's more natural for target guarantees to apply to instructions than
to rtx codes.

>>  And
>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about
>> how the normal shift optabs behave, so I don't think we'd need new
>> optabs or new unspecs.
>>
>> E.g. it already works this way when expanding double-word shifts,
>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's
>> possible to use a shorter sequence if you know that the shift optab
>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED
>> isn't defined.
>
> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK
> applies to the instructions generated by the named shift patterns but _not_
> general shift RTXen.  But the generated pattern contains shift RTXen and how
> can we figure whether they were generated by the named expanders or
> by other means?  Don't define_expand also serve as define_insn for things
> like combine?

Yeah, you can't (and aren't supposed to try to) reverse-engineer the
expander from the generated instructions.  TARGET_SHIFT_TRUNCATION_MASK
should only be used if you're expanding a shift optab.

> That said, from looking at the docs and your description above it seems that
> semantics are not fully reflected in the RTL instruction stream?

Yeah, the semantics go from being well-defined in the optab interface
to being target-dependent in the rtl stream.

Thanks,
Richard

>
> Richard.
>
>> Thanks,
>> Richard
>>
>>>
>>> Richard.
>>>
>>>>Thanks,
>>>>Richard

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Reply via email to