[PATCH][1/5] aarch64: Reimplement [US]Q(R)SHR(U)N patterns with RTL codes

Kyrylo Tkachov via Gcc-patches Fri, 16 Jun 2023 06:07:40 -0700

This patch reimplements the MD patterns for the instructions that
perform narrowing right shifts with optional rounding and saturation
using standard RTL codes rather than unspecs.


There are four groups of patterns involved:

* Simple narrowing shifts with optional signed or unsigned truncation:
SHRN, SQSHRN, UQSHRN.  These are expressed as a truncation operation of
a right shift.  The matrix of valid combinations looks like this:

            |   ashiftrt   |   lshiftrt  |
------------------------------------------
ss_truncate |   SQSHRN     |      X      |
us_truncate |     X        |    UQSHRN   |
truncate    |     X        |     SHRN    |
------------------------------------------

* Narrowing shifts with rounding with optional signed or unsigned
truncation: RSHRN, SQRSHRN, UQRSHRN.  These follow the same
combinations of truncation and shift codes as above, but also perform
intermediate widening of the results in order to represent the addition
of the rounding constant.  This group also corrects an existing
inaccuracy for RSHRN where we don't currently model the intermediate
widening for rounding.

* The somewhat special "Signed saturating Shift Right Unsigned Narrow":
SQSHRUN.  Similar to the SQXTUN instructions, these perform a
saturating truncation that isn't represented by US_TRUNCATE or
SS_TRUNCATE but needs to use a clamping operation followed by a
TRUNCATE.

* The rounding version of the above: SQRSHRUN.  It needs the special
clamping truncate representation but with an intermediate widening and
rounding addition.

Besides using standard RTL codes for all of the above instructions, this
patch allows us to get rid of the explicit define_insns and
define_expands for SHRN and RSHRN.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.  We've got pretty thorough execute tests in
advsimd-intrinsics.exp that exercise these and many instances of these
instructions get constant-folded away during optimisation and the
validation still passes (during development where I was figuring out the
details of the semantics they were discovering failures), so I'm fairly
confident in the representation.

gcc/ChangeLog:

        * config/aarch64/aarch64-simd-builtins.def (shrn): Rename builtins to...
        (shrn_n): ... This.
        (rshrn): Rename builtins to...
        (rshrn_n): ... This.
        * config/aarch64/arm_neon.h (vshrn_n_s16): Adjust for the above.
        (vshrn_n_s32): Likewise.
        (vshrn_n_s64): Likewise.
        (vshrn_n_u16): Likewise.
        (vshrn_n_u32): Likewise.
        (vshrn_n_u64): Likewise.
        (vrshrn_n_s16): Likewise.
        (vrshrn_n_s32): Likewise.
        (vrshrn_n_s64): Likewise.
        (vrshrn_n_u16): Likewise.
        (vrshrn_n_u32): Likewise.
        (vrshrn_n_u64): Likewise.
        * config/aarch64/aarch64-simd.md
        (*aarch64_<srn_op>shrn<mode><vczle><vczbe>): Delete.
        (aarch64_shrn<mode>): Likewise.
        (aarch64_rshrn<mode><vczle><vczbe>_insn): Likewise.
        (aarch64_rshrn<mode>): Likewise.
        (aarch64_<sur>q<r>shr<u>n_n<mode>_insn<vczle><vczbe>): Likewise.
        (aarch64_<sur>q<r>shr<u>n_n<mode>): Likewise.
        (*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): New define_insn.
        (*aarch64_<shrn_op>rshrn_n<mode>_insn<vczle><vczbe>): Likewise.
        (*aarch64_sqshrun_n<mode>_insn<vczle><vczbe>): Likewise.
        (*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise.
        (aarch64_<shrn_op>shrn_n<mode>): New define_expand.
        (aarch64_<shrn_op>rshrn_n<mode>): Likewise.
        (aarch64_sqshrun_n<mode>): Likewise.
        (aarch64_sqrshrun_n<mode>): Likewise.
        * config/aarch64/iterators.md (ALL_TRUNC): New code iterator.
        (TRUNCEXTEND): New code attribute.
        (TRUNC_SHIFT): Likewise.
        (shrn_op): Likewise.
        * config/aarch64/predicates.md (aarch64_simd_umax_quarter_mode):
        New predicate.

s1.patch
Description: s1.patch

[PATCH][1/5] aarch64: Reimplement [US]Q(R)SHR(U)N patterns with RTL codes

Reply via email to