Because of the issue described in PR115610, late_combine is disabled by
default.The series try to solve the regressions and enable late_combine.
There're 4 regressions observed.

1. The first one is related to pass_stv2, because late_combine will restore
transformation did in the pass. Move the pass after pass_late_combine can
solve the issue.

2. The second one is related to pass_rpad, both pre_reload and post_reload
late_combine would restore the transformation. So besides moving pass_rpad
after pre_reload late_combine, target_insn_cost is defined to prevent
post_reload pass_late_combine to revert the optimziation did in pass_rpad.

3. The third one is related to avx512 kmask, lshirt + zero_extend are combined
into *<insn>si3_zext which doesn't support k alternative, and an extra move
between GPR and KMASK and regressed
gcc.target/i386/zero_extendkmask.c scan-assembler-not (?n)shr[bwl],
the solution is extending the pattern to ?k alternative just like what we did
before for other patterns.

4. The fourth one is fake, it's because pass_late_combine generates better code 
but
break scan assembly.
.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After enabling flate_combine, they're combined into embeded broadcast
operations.

Tested with SPEC2017, flate_combine reduces codesize by ~0.6%, which means
there're lots of small improvements.
Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
Ok for trunk?


liuhongt (3):
  [avx512 testsuite] Define mask as extern instead of uninitialized
    local variables.
  Extend lshifrtsi3_1_zext to ?k alternative.
  [x86] Enable flate-combine.

 gcc/config/i386/i386-features.cc              | 16 +++++++----
 gcc/config/i386/i386-options.cc               |  4 ---
 gcc/config/i386/i386-passes.def               |  4 +--
 gcc/config/i386/i386-protos.h                 |  1 +
 gcc/config/i386/i386.cc                       | 18 ++++++++++++
 gcc/config/i386/i386.md                       | 19 +++++++++----
 gcc/config/i386/sse.md                        | 28 +++++++++++++++++++
 .../gcc.target/i386/avx512bitalg-vpopcntb.c   |  3 +-
 .../gcc.target/i386/avx512bitalg-vpopcntbvl.c |  4 +--
 .../gcc.target/i386/avx512bitalg-vpopcntw.c   |  2 +-
 .../gcc.target/i386/avx512bitalg-vpopcntwvl.c |  4 +--
 .../i386/avx512f-broadcast-pr87767-1.c        |  4 +--
 .../i386/avx512f-broadcast-pr87767-5.c        |  1 -
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c  |  2 +-
 .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c  |  2 +-
 .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c |  2 +-
 .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c |  2 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  4 +--
 .../i386/avx512vl-broadcast-pr87767-5.c       |  2 --
 .../i386/avx512vpopcntdq-vpopcntd.c           |  5 ++--
 .../i386/avx512vpopcntdq-vpopcntq.c           |  2 +-
 gcc/testsuite/gcc.target/i386/pr91333.c       |  2 +-
 .../gcc.target/i386/vect-strided-4.c          |  2 +-
 23 files changed, 93 insertions(+), 40 deletions(-)

-- 
2.31.1

Reply via email to