Because of the issue described in PR115610, late_combine is disabled by default.The series try to solve the regressions and enable late_combine. There're 4 regressions observed.
1. The first one is related to pass_stv2, because late_combine will restore transformation did in the pass. Move the pass after pass_late_combine can solve the issue. 2. The second one is related to pass_rpad, both pre_reload and post_reload late_combine would restore the transformation. So besides moving pass_rpad after pre_reload late_combine, target_insn_cost is defined to prevent post_reload pass_late_combine to revert the optimziation did in pass_rpad. 3. The third one is related to avx512 kmask, lshirt + zero_extend are combined into *<insn>si3_zext which doesn't support k alternative, and an extra move between GPR and KMASK and regressed gcc.target/i386/zero_extendkmask.c scan-assembler-not (?n)shr[bwl], the solution is extending the pattern to ?k alternative just like what we did before for other patterns. 4. The fourth one is fake, it's because pass_late_combine generates better code but break scan assembly. .i.e Under 32-bit target, gcc used to generate broadcast from stack and then do the real operation. After enabling flate_combine, they're combined into embeded broadcast operations. Tested with SPEC2017, flate_combine reduces codesize by ~0.6%, which means there're lots of small improvements. Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}. Ok for trunk? liuhongt (3): [avx512 testsuite] Define mask as extern instead of uninitialized local variables. Extend lshifrtsi3_1_zext to ?k alternative. [x86] Enable flate-combine. gcc/config/i386/i386-features.cc | 16 +++++++---- gcc/config/i386/i386-options.cc | 4 --- gcc/config/i386/i386-passes.def | 4 +-- gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/i386.cc | 18 ++++++++++++ gcc/config/i386/i386.md | 19 +++++++++---- gcc/config/i386/sse.md | 28 +++++++++++++++++++ .../gcc.target/i386/avx512bitalg-vpopcntb.c | 3 +- .../gcc.target/i386/avx512bitalg-vpopcntbvl.c | 4 +-- .../gcc.target/i386/avx512bitalg-vpopcntw.c | 2 +- .../gcc.target/i386/avx512bitalg-vpopcntwvl.c | 4 +-- .../i386/avx512f-broadcast-pr87767-1.c | 4 +-- .../i386/avx512f-broadcast-pr87767-5.c | 1 - .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 2 +- .../i386/avx512vl-broadcast-pr87767-1.c | 4 +-- .../i386/avx512vl-broadcast-pr87767-5.c | 2 -- .../i386/avx512vpopcntdq-vpopcntd.c | 5 ++-- .../i386/avx512vpopcntdq-vpopcntq.c | 2 +- gcc/testsuite/gcc.target/i386/pr91333.c | 2 +- .../gcc.target/i386/vect-strided-4.c | 2 +- 23 files changed, 93 insertions(+), 40 deletions(-) -- 2.31.1