This patch moves the lowering of 128-bit V1TImode shifts and rotations by constant bit counts to sequences of SSE operations from the RTL expansion pass to the pre-reload split pass. Postponing this splitting of shifts and rotates enables (will enable) the TImode equivalents of these operations/ instructions to be considered as candidates by the (TImode) STV pass. Technically, this patch changes the existing expanders to continue to lower shifts by variable amounts, but constant operands become RTL instructions, specified by define_insn_and_split that are triggered by x86_pre_reload_split. The one minor complication is that logical shifts by multiples of eight, don't get split, but are handled by existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3. There should be no changes in generated code with this patch, which just adjusts the pass in which transformations get applied.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-05 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left shifts by constant bit counts. (*ashlvti3_internal): New define_insn_and_split that lowers logical left shifts by constant bit counts, that aren't multiples of 8, before reload. (lshrv1ti3): Delay lowering of logical right shifts by constant. (*lshrv1ti3_internal): New define_insn_and_split that lowers logical right shifts by constant bit counts, that aren't multiples of 8, before reload. (ashrv1ti3):: Delay lowering of arithmetic right shifts by constant bit counts. (*ashrv1ti3_internal): New define_insn_and_split that lowers arithmetic right shifts by constant bit counts before reload. (rotlv1ti3): Delay lowering of rotate left by constant. (*rotlv1ti3_internal): New define_insn_and_split that lowers rotate left by constant bits counts before reload. (rotrv1ti3): Delay lowering of rotate right by constant. (*rotrv1ti3_internal): New define_insn_and_split that lowers rotate right by constant bits counts before reload. Thanks in advance, Roger --
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 14d12d1..d3ea52f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15995,10 +15995,30 @@ (define_expand "ashlv1ti3" [(set (match_operand:V1TI 0 "register_operand") + (ashift:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:QI 2 "general_operand")))] + "TARGET_SSE2 && TARGET_64BIT" +{ + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (ASHIFT, operands); + DONE; + } +}) + +(define_insn_and_split "*ashlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") (ashift:V1TI (match_operand:V1TI 1 "register_operand") - (match_operand:QI 2 "general_operand")))] - "TARGET_SSE2 && TARGET_64BIT" + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] { ix86_expand_v1ti_shift (ASHIFT, operands); DONE; @@ -16011,6 +16031,26 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (LSHIFTRT, operands); + DONE; + } +}) + +(define_insn_and_split "*lshrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (lshiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_shift (LSHIFTRT, operands); DONE; }) @@ -16022,6 +16062,26 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_ashiftrt (operands); + DONE; + } +}) + + +(define_insn_and_split "*ashrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (ashiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_ashiftrt (operands); DONE; }) @@ -16033,6 +16093,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATE, operands); + DONE; + } +}) + +(define_insn_and_split "*rotlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotate:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATE, operands); DONE; }) @@ -16044,6 +16123,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATERT, operands); + DONE; + } +}) + +(define_insn_and_split "*rotrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotatert:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATERT, operands); DONE; })