On Wed, Apr 13, 2022 at 11:03 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch resolves the last piece of PR target/70321 a code quality
> (P2 regression) affecting mainline.  Currently, for HJ's testcase:
>
> void foo (long long ixi)
> {
>   if (ixi != 14348907)
>     __builtin_abort ();
> }
>
> GCC with -m32 -O2 generates four instructions for the comparison:
>
>         movl    16(%esp), %eax
>         movl    20(%esp), %edx
>         xorl    $14348907, %eax
>         orl     %eax, %edx
>
> but with this patch it now requires only three, making better use of
> x86's addressing modes:
>
>         movl    16(%esp), %eax
>         xorl    $14348907, %eax
>         orl     20(%esp), %eax
>
> The solution is to expand "doubleword" equality/inequality expressions
> using flag setting COMPARE instructions for the early RTL passes, and
> then split them during split1, after STV and before reload.
> Hence on x86_64, we now see/allow things like:
>
> (insn 11 8 12 2 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg/v:TI 84 [ x ])
>             (reg:TI 96))) "cmpti.c":2:43 30 {*cmpti_doubleword}
>
> This allows the STV pass to decide whether it's preferrable to perform
> this comparison using vector operations, i.e. a pxor/ptest sequence,
> or as scalar integer operations, i.e. a xor/xor/or sequence.  Alas
> this required tweaking of the STV pass to recognize the "new" form of
> these comparisons and split out the pxor operation itself.  To confirm
> this still works as expected I've added a new STV test case:
>
> long long a[1024];
> long long b[1024];
>
> int foo()
> {
>   for (int i=0; i<1024; i++)
>   {
>     long long t = (a[i]<<8) | (b[i]<<24);
>     if (t == 0)
>       return 1;
>   }
>   return 0;
> }
>
> where with -m32 -O2 -msse4.1 the above comparison with zero should look
> like:
>
>         punpcklqdq      %xmm0, %xmm0
>         ptest   %xmm0, %xmm0
>
> Although this patch includes one or two minor tweaks to provide all the
> necessary infrastructure to support conversion of TImode comparisons to
> V1TImode (and SImode comparisons to V4SImode), STV doesn't yet implement
> these transformations, but this is something that can be considered after
> stage 4.  Indeed the new convert_compare functionality is split out
> into a method to simplify its potential reuse by the timode_scalar_chain
> class.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures for -m64, and two (missed-optimization) FAILs on
> -m32 (both around pandn).  The first gcc.target/i386/pr65105-5.c is
> resolved by the patch I posted yesterday, the other i386/pr78794.c is
> easily fixed by tweaking STV's "gain" estimates, but it's preferrable to
> generate better code for both the scalar and vector implementations
> before tweaking these parameters (patches to follow shortly).
>
> Ok for mainline?

Unless there is a really compelling reason that warrants committing
this patch in stage 4, I'd propose to postpone it to next stage 1. The
patch is quite involved and touches a somewhat complex part of the
compiler (STV). Also, looking at the above description, the patch
introduces new functionality that represents only a part of the
complete solution, and as such opens a door for possible regressions,
which are not welcomed in stage 4 at all.

Uros.

>
> 2022-04-13  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/70321
>         * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
>         DI mode equality/inequality using XOR here.  Instead generate a
>         COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
>         * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
>         gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
>         (general_scalar_chain::convert_compare): New function to convert
>         scalar equality/inequality comparison into vector operations.
>         (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
>         new convert_compare helper method.
>         (convertible_comparion_p): Update to match doubleword COMPARE
>         of two register, memory or integer constant operands.
>         * config/i386/i386-features.h
> (general_scalar_chain::convert_compare):
>         Prototype/declare member function here.
>         * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but
>         only allow new doubleword modes for EQ and NE operators.
>         (*cmp<dwi>_doubleword): New define_insn_and_split, to split a
>         doubleword comparison into a pair of XORs followed by an IOR to
>         set the (zero) flags register, optimizing the XORs if possible.
>         * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> iterator;
>         V_AVX is (currently) only used by ptest.
>         (sse4_1 mode attribute): Update to support V1TI and V2TI.
>
> gcc/testsuite/ChangeLog
>         PR target/70321
>         * gcc.target/i386/pr70321.c: New test case.
>         * gcc.target/i386/sse4_1-stv-1.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to