On Mon, May 30, 2022 at 11:18 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, May 30, 2022 at 11:11 AM Roger Sayle <ro...@nextmovesoftware.com> 
> wrote:
> >
> >
> > Hi Uros,
> > This is a ping of my patch from April, which as you've suggested should be
> > submitted
> > for review even if there remain two missed-optimization regressions on ia32
> > (to
> > allow reviewers to better judge if those fixes are appropriate/the best
> > solution).
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html
> >
> > The executive summary is that the core of this patch is a single pre-reload
> > splitter:
> >
> > (define_insn_and_split "*cmp<dwi>_doubleword"
> >   [(set (reg:CCZ FLAGS_REG)
> >        (compare:CCZ (match_operand:<DWI> 0 "nonimmediate_operand")
> >                     (match_operand:<DWI> 1 "x86_64_general_operand")))]
> >   "ix86_pre_reload_split ()"
> >   "#"
> >   "&& 1"
> >   [(parallel [(set (reg:CCZ FLAGS_REG)
> >                   (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5))
> >                                (const_int 0)))
> >              (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
> >
> > That allows the RTL optimizers to assume the target has a double word
> > equality/inequality comparison during combine, but then split this into
> > an CC setting IOR of the lowpart and highpart just before reload.
> >
> > The intended effect is that for PR target/70321's test case:
> >
> > void foo (long long ixi)
> > {
> >   if (ixi != 14348907)
> >     __builtin_abort ();
> > }
> >
> > where with -m32 -O2 GCC previously produced:
> >
> >         movl    16(%esp), %eax
> >         movl    20(%esp), %edx
> >         xorl    $14348907, %eax
> >         orl     %eax, %edx
> >         jne     .L3
> >
> > we now produce the slightly improved:
> >
> >         movl    16(%esp), %eax
> >         xorl    $14348907, %eax
> >         orl     20(%esp), %eax
> >         jne     .L3
> >
> > Similar improvements are seen with _int128 equality on TARGET_64BIT.
> >
> > The rest of the patch, in fact the bulk of it, is purely to adjust the other
> > parts of the i386 backend that make the assumption that double word
> > equality has been lowered during RTL expansion, including for example
> > STV which turns DImode equality into SSE ptest, which previously
> > explicitly looked for the IOR of lowpart/highpart.
> >
> > This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, with no new failures.  However, when adding
> > --target_board=unix{-m32} there two new missed optimization FAILs
> > both related to pandn.
> > FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
> > FAIL: gcc.target/i386/pr78794.c scan-assembler pandn
> >
> > These become the requested test cases for the fix proposed here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html
> >
> > OK for mainline, now we're in stage 1?
> >
> >
> > 2022-05-30  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/70321
> >         * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
> >         DI mode equality/inequality using XOR here.  Instead generate a
> >         COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
> >         * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
> >         gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
> >         (general_scalar_chain::convert_compare): New function to convert
> >         scalar equality/inequality comparison into vector operations.
> >         (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
> >         new convert_compare helper method.
> >         (convertible_comparion_p): Update to match doubleword COMPARE
> >         of two register, memory or integer constant operands.
> >         * config/i386/i386-features.h
> > (general_scalar_chain::convert_compare):
> >         Prototype/declare member function here.
> >         * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but
> >         only allow new doubleword modes for EQ and NE operators.
> >         (*cmp<dwi>_doubleword): New define_insn_and_split, to split a
> >         doubleword comparison into a pair of XORs followed by an IOR to
> >         set the (zero) flags register, optimizing the XORs if possible.
> >         * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> > iterator;
> >         V_AVX is (currently) only used by ptest.
> >         (sse4_1 mode attribute): Update to support V1TI and V2TI.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/70321
> >         * gcc.target/i386/pr70321.c: New test case.
> >         * gcc.target/i386/sse4_1-stv-1.c: New test case.

@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */

Ah, you will need explicit -mstv -mno-stackrealign here.

Uros.

Reply via email to