On Mon, May 30, 2022 at 11:18 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, May 30, 2022 at 11:11 AM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > Hi Uros, > > This is a ping of my patch from April, which as you've suggested should be > > submitted > > for review even if there remain two missed-optimization regressions on ia32 > > (to > > allow reviewers to better judge if those fixes are appropriate/the best > > solution). > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html > > > > The executive summary is that the core of this patch is a single pre-reload > > splitter: > > > > (define_insn_and_split "*cmp<dwi>_doubleword" > > [(set (reg:CCZ FLAGS_REG) > > (compare:CCZ (match_operand:<DWI> 0 "nonimmediate_operand") > > (match_operand:<DWI> 1 "x86_64_general_operand")))] > > "ix86_pre_reload_split ()" > > "#" > > "&& 1" > > [(parallel [(set (reg:CCZ FLAGS_REG) > > (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5)) > > (const_int 0))) > > (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])] > > > > That allows the RTL optimizers to assume the target has a double word > > equality/inequality comparison during combine, but then split this into > > an CC setting IOR of the lowpart and highpart just before reload. > > > > The intended effect is that for PR target/70321's test case: > > > > void foo (long long ixi) > > { > > if (ixi != 14348907) > > __builtin_abort (); > > } > > > > where with -m32 -O2 GCC previously produced: > > > > movl 16(%esp), %eax > > movl 20(%esp), %edx > > xorl $14348907, %eax > > orl %eax, %edx > > jne .L3 > > > > we now produce the slightly improved: > > > > movl 16(%esp), %eax > > xorl $14348907, %eax > > orl 20(%esp), %eax > > jne .L3 > > > > Similar improvements are seen with _int128 equality on TARGET_64BIT. > > > > The rest of the patch, in fact the bulk of it, is purely to adjust the other > > parts of the i386 backend that make the assumption that double word > > equality has been lowered during RTL expansion, including for example > > STV which turns DImode equality into SSE ptest, which previously > > explicitly looked for the IOR of lowpart/highpart. > > > > This patch has been retested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, with no new failures. However, when adding > > --target_board=unix{-m32} there two new missed optimization FAILs > > both related to pandn. > > FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn > > FAIL: gcc.target/i386/pr78794.c scan-assembler pandn > > > > These become the requested test cases for the fix proposed here: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html > > > > OK for mainline, now we're in stage 1? > > > > > > 2022-05-30 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/70321 > > * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose > > DI mode equality/inequality using XOR here. Instead generate a > > COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode). > > * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use > > gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode. > > (general_scalar_chain::convert_compare): New function to convert > > scalar equality/inequality comparison into vector operations. > > (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call > > new convert_compare helper method. > > (convertible_comparion_p): Update to match doubleword COMPARE > > of two register, memory or integer constant operands. > > * config/i386/i386-features.h > > (general_scalar_chain::convert_compare): > > Prototype/declare member function here. > > * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but > > only allow new doubleword modes for EQ and NE operators. > > (*cmp<dwi>_doubleword): New define_insn_and_split, to split a > > doubleword comparison into a pair of XORs followed by an IOR to > > set the (zero) flags register, optimizing the XORs if possible. > > * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode > > iterator; > > V_AVX is (currently) only used by ptest. > > (sse4_1 mode attribute): Update to support V1TI and V2TI. > > > > gcc/testsuite/ChangeLog > > PR target/70321 > > * gcc.target/i386/pr70321.c: New test case. > > * gcc.target/i386/sse4_1-stv-1.c: New test case.
@@ -0,0 +1,10 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ Ah, you will need explicit -mstv -mno-stackrealign here. Uros.