On Wed, Apr 13, 2022 at 11:03 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch resolves the last piece of PR target/70321 a code quality > (P2 regression) affecting mainline. Currently, for HJ's testcase: > > void foo (long long ixi) > { > if (ixi != 14348907) > __builtin_abort (); > } > > GCC with -m32 -O2 generates four instructions for the comparison: > > movl 16(%esp), %eax > movl 20(%esp), %edx > xorl $14348907, %eax > orl %eax, %edx > > but with this patch it now requires only three, making better use of > x86's addressing modes: > > movl 16(%esp), %eax > xorl $14348907, %eax > orl 20(%esp), %eax > > The solution is to expand "doubleword" equality/inequality expressions > using flag setting COMPARE instructions for the early RTL passes, and > then split them during split1, after STV and before reload. > Hence on x86_64, we now see/allow things like: > > (insn 11 8 12 2 (set (reg:CCZ 17 flags) > (compare:CCZ (reg/v:TI 84 [ x ]) > (reg:TI 96))) "cmpti.c":2:43 30 {*cmpti_doubleword} > > This allows the STV pass to decide whether it's preferrable to perform > this comparison using vector operations, i.e. a pxor/ptest sequence, > or as scalar integer operations, i.e. a xor/xor/or sequence. Alas > this required tweaking of the STV pass to recognize the "new" form of > these comparisons and split out the pxor operation itself. To confirm > this still works as expected I've added a new STV test case: > > long long a[1024]; > long long b[1024]; > > int foo() > { > for (int i=0; i<1024; i++) > { > long long t = (a[i]<<8) | (b[i]<<24); > if (t == 0) > return 1; > } > return 0; > } > > where with -m32 -O2 -msse4.1 the above comparison with zero should look > like: > > punpcklqdq %xmm0, %xmm0 > ptest %xmm0, %xmm0 > > Although this patch includes one or two minor tweaks to provide all the > necessary infrastructure to support conversion of TImode comparisons to > V1TImode (and SImode comparisons to V4SImode), STV doesn't yet implement > these transformations, but this is something that can be considered after > stage 4. Indeed the new convert_compare functionality is split out > into a method to simplify its potential reuse by the timode_scalar_chain > class. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures for -m64, and two (missed-optimization) FAILs on > -m32 (both around pandn). The first gcc.target/i386/pr65105-5.c is > resolved by the patch I posted yesterday, the other i386/pr78794.c is > easily fixed by tweaking STV's "gain" estimates, but it's preferrable to > generate better code for both the scalar and vector implementations > before tweaking these parameters (patches to follow shortly). > > Ok for mainline?
Unless there is a really compelling reason that warrants committing this patch in stage 4, I'd propose to postpone it to next stage 1. The patch is quite involved and touches a somewhat complex part of the compiler (STV). Also, looking at the above description, the patch introduces new functionality that represents only a part of the complete solution, and as such opens a door for possible regressions, which are not welcomed in stage 4 at all. Uros. > > 2022-04-13 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/70321 > * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose > DI mode equality/inequality using XOR here. Instead generate a > COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode). > * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use > gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode. > (general_scalar_chain::convert_compare): New function to convert > scalar equality/inequality comparison into vector operations. > (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call > new convert_compare helper method. > (convertible_comparion_p): Update to match doubleword COMPARE > of two register, memory or integer constant operands. > * config/i386/i386-features.h > (general_scalar_chain::convert_compare): > Prototype/declare member function here. > * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but > only allow new doubleword modes for EQ and NE operators. > (*cmp<dwi>_doubleword): New define_insn_and_split, to split a > doubleword comparison into a pair of XORs followed by an IOR to > set the (zero) flags register, optimizing the XORs if possible. > * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode > iterator; > V_AVX is (currently) only used by ptest. > (sse4_1 mode attribute): Update to support V1TI and V2TI. > > gcc/testsuite/ChangeLog > PR target/70321 > * gcc.target/i386/pr70321.c: New test case. > * gcc.target/i386/sse4_1-stv-1.c: New test case. > > > Thanks in advance, > Roger > -- >