On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch moves the lowering of 128-bit V1TImode shifts and rotations by > constant bit counts to sequences of SSE operations from the RTL expansion > pass to the pre-reload split pass. Postponing this splitting of shifts > and rotates enables (will enable) the TImode equivalents of these > operations/ > instructions to be considered as candidates by the (TImode) STV pass. > Technically, this patch changes the existing expanders to continue to > lower shifts by variable amounts, but constant operands become RTL > instructions, specified by define_insn_and_split that are triggered by > x86_pre_reload_split. The one minor complication is that logical shifts > by multiples of eight, don't get split, but are handled by existing insn > patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3. There should be no > changes in generated code with this patch, which just adjusts the pass > in which transformations get applied. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, with > no new failures. Ok for mainline? > > > > 2022-08-05 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left > shifts by constant bit counts. > (*ashlvti3_internal): New define_insn_and_split that lowers > logical left shifts by constant bit counts, that aren't multiples > of 8, before reload. > (lshrv1ti3): Delay lowering of logical right shifts by constant. > (*lshrv1ti3_internal): New define_insn_and_split that lowers > logical right shifts by constant bit counts, that aren't multiples > of 8, before reload. > (ashrv1ti3):: Delay lowering of arithmetic right shifts by > constant bit counts. > (*ashrv1ti3_internal): New define_insn_and_split that lowers > arithmetic right shifts by constant bit counts before reload. > (rotlv1ti3): Delay lowering of rotate left by constant. > (*rotlv1ti3_internal): New define_insn_and_split that lowers > rotate left by constant bits counts before reload. > (rotrv1ti3): Delay lowering of rotate right by constant. > (*rotrv1ti3_internal): New define_insn_and_split that lowers > rotate right by constant bits counts before reload.
+(define_insn_and_split "*ashlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") (ashift:V1TI (match_operand:V1TI 1 "register_operand") - (match_operand:QI 2 "general_operand")))] - "TARGET_SSE2 && TARGET_64BIT" + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 Please introduce const_0_to_255_not_mul_8_operand predicate. Alternatively, and preferably, you can use pattern shadowing, where the preceding, more constrained pattern will match before the following, more broad pattern will. Uros.