On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch moves the lowering of 128-bit V1TImode shifts and rotations by
> constant bit counts to sequences of SSE operations from the RTL expansion
> pass to the pre-reload split pass.  Postponing this splitting of shifts
> and rotates enables (will enable) the TImode equivalents of these
> operations/
> instructions to be considered as candidates by the (TImode) STV pass.
> Technically, this patch changes the existing expanders to continue to
> lower shifts by variable amounts, but constant operands become RTL
> instructions, specified by define_insn_and_split that are triggered by
> x86_pre_reload_split.  The one minor complication is that logical shifts
> by multiples of eight, don't get split, but are handled by existing insn
> patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.  There should be no
> changes in generated code with this patch, which just adjusts the pass
> in which transformations get applied.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Ok for mainline?
>
>
>
> 2022-08-05  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
>         shifts by constant bit counts.
>         (*ashlvti3_internal): New define_insn_and_split that lowers
>         logical left shifts by constant bit counts, that aren't multiples
>         of 8, before reload.
>         (lshrv1ti3): Delay lowering of logical right shifts by constant.
>         (*lshrv1ti3_internal): New define_insn_and_split that lowers
>         logical right shifts by constant bit counts, that aren't multiples
>         of 8, before reload.
>         (ashrv1ti3):: Delay lowering of arithmetic right shifts by
>         constant bit counts.
>         (*ashrv1ti3_internal): New define_insn_and_split that lowers
>         arithmetic right shifts by constant bit counts before reload.
>         (rotlv1ti3): Delay lowering of rotate left by constant.
>         (*rotlv1ti3_internal): New define_insn_and_split that lowers
>         rotate left by constant bits counts before reload.
>         (rotrv1ti3): Delay lowering of rotate right by constant.
>         (*rotrv1ti3_internal): New define_insn_and_split that lowers
>         rotate right by constant bits counts before reload.

+(define_insn_and_split "*ashlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
  (ashift:V1TI
  (match_operand:V1TI 1 "register_operand")
- (match_operand:QI 2 "general_operand")))]
-  "TARGET_SSE2 && TARGET_64BIT"
+ (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && (INTVAL (operands[2]) & 7) != 0

Please introduce const_0_to_255_not_mul_8_operand predicate.
Alternatively, and preferably, you can use pattern shadowing, where
the preceding, more constrained pattern will match before the
following, more broad pattern will.

Uros.

Reply via email to