Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux and ppc64le-redhat-linux.
Previous iteration: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00495.html In the end, the main question was: does this make the code better on architectures other than s390? https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00993.html Not sure whether it's already too late for this one, but I'd like to at least post the updated code, my observations and SPEC CPU results. - Code size decreases in most cases. In general, the main side-effect of this patch is that after jump threading bbro pass builds different traces and reorders and merges basic blocks differently: # x86_64-redhat-linux: 436.cactusADM 274479 insns -528 smaller # maximum decrease 526.blender_r 2773303 insns -203 smaller 502.gcc_r 2262388 insns -142 smaller 403.gcc 815367 insns -106 smaller ... 525.x264_r 174450 insns +10 bigger # maximum increase # ppc64le-redhat-linux: 526.blender_r 3422613 insns -276 smaller # maximum decrease 521.wrf_r 6008722 insns -228 smaller 520.omnetpp_r 612626 insns -52 smaller ... 435.gromacs 338597 insns +16 bigger # maximum increase - Compilation performance did not seem to have been affected in a measurable way. According to -ftime-report, the total user time of SPEC CPU build used to be 26018s, and now it is 25985s, the difference being -0.12%. - Run time differences are all over the place: # x86_64-redhat-linux: 548.exchange2_r -1.82% 541.leela_r -1.59% 538.imagick_r -0.95% 520.omnetpp_r -0.94% 403.gcc -0.76% 447.dealII -0.58% 526.blender_r -0.56% 450.soplex -0.51% # skip |dt| < 0.5% 523.xalancbmk_r +0.52% 416.gamess +0.61% 503.bwaves_r +0.62% 445.gobmk +0.66% 456.hmmer +0.70% 549.fotonik3d_r +0.74% 471.omnetpp +0.99% 459.GemsFDTD +1.09% 554.roms_r +1.30% 500.perlbench_r +1.56% 483.xalancbmk +1.60% # ppc64le-redhat-linux: 511.povray_r -1.29% 482.sphinx3 -0.65% 456.hmmer -0.53% 519.lbm_r -0.51% # skip |dt| < 0.5% 549.fotonik3d_r +1.13% 403.gcc +1.76% 500.perlbench_r +2.35% I've investigated 483.xalancbmk and 500.perlbench_r regressions on x86_64. Even though the total 483.xalancbmk size slightly decreases, we get 4% more icache misses and 25% more stalls because of that. I couldn't pinpoint that to a certain function or line of code - can this be due to somehow generally worsened locality? 500.perlbench_r has 25% more indirect branch mispedicts, particularly, when perl_run ends up calling Perl_pp_rv2av, Perl_pp_gvsv and Perl_pp_nextstate. I have to admit I don't know what could have caused that. Consider the following RTL: (insn (set (reg 65) (if_then_else (eq %cc 0) 1 0))) (insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)])) (jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc))) Combine simplifies this into: (note NOTE_INSN_DELETED) (note NOTE_INSN_DELETED) (jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc))) opening up the possibility to perform jump threading. gcc/ChangeLog: 2018-09-19 Ilya Leoshkevich <i...@linux.ibm.com> PR target/80080 * cfgcleanup.c (class pass_postreload_jump): New pass. (pass_postreload_jump::execute): Likewise. (make_pass_postreload_jump): Likewise. * passes.def: Add pass_postreload_jump before pass_postreload_cse. * tree-pass.h (make_pass_postreload_jump): New pass. gcc/testsuite/ChangeLog: 2018-09-05 Ilya Leoshkevich <i...@linux.ibm.com> PR target/80080 * gcc.target/s390/pr80080-4.c: New test. --- gcc/cfgcleanup.c | 42 +++++++++++++++++++++++ gcc/passes.def | 1 + gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 +++++++++ gcc/tree-pass.h | 1 + 4 files changed, 60 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 4a5dc29d14f..bc4a78889db 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -3259,6 +3259,48 @@ make_pass_jump (gcc::context *ctxt) namespace { +const pass_data pass_data_postreload_jump = +{ + RTL_PASS, /* type */ + "postreload_jump", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_JUMP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_postreload_jump : public rtl_opt_pass +{ +public: + pass_postreload_jump (gcc::context *ctxt) + : rtl_opt_pass (pass_data_postreload_jump, ctxt) + {} + + /* opt_pass methods: */ + virtual unsigned int execute (function *); + +}; // class pass_postreload_jump + +unsigned int +pass_postreload_jump::execute (function *) +{ + cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0); + return 0; +} + +} // anon namespace + +rtl_opt_pass * +make_pass_postreload_jump (gcc::context *ctxt) +{ + return new pass_postreload_jump (ctxt); +} + +namespace { + const pass_data pass_data_jump2 = { RTL_PASS, /* type */ diff --git a/gcc/passes.def b/gcc/passes.def index 82ad9404b9e..0079fecef32 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -458,6 +458,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_reload); NEXT_PASS (pass_postreload); PUSH_INSERT_PASSES_WITHIN (pass_postreload) + NEXT_PASS (pass_postreload_jump); NEXT_PASS (pass_postreload_cse); NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c new file mode 100644 index 00000000000..5fc6a558008 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-march=z196 -O2" } */ + +extern void bar(int *mem); + +void foo4(int *mem) +{ + int oldval = 0; + if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1, + 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) + { + bar (mem); + } +} + +/* { dg-final { scan-assembler {(?n)\n\tlt\t.*\n\tjne\t(\.L\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\1:\n\tjg\tbar\n} } } */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 2f8779ee4b8..b20d34c15e9 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -579,6 +579,7 @@ extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt); extern rtl_opt_pass *make_pass_branch_prob (gcc::context *ctxt); extern rtl_opt_pass *make_pass_value_profile_transformations (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_postreload_jump (gcc::context *ctxt); extern rtl_opt_pass *make_pass_postreload_cse (gcc::context *ctxt); extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt); -- 2.19.1