On Thu, Aug 31, 2023 at 5:15 AM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, 31 Aug 2023, Filip Kastl wrote: > > > > The most obvious places would be right after SSA construction and before > > > RTL expansion. > > > Can you provide measurements for those positions? > > > > The algorithm should only remove PHIs that break SSA form minimality. Since > > GCC's SSA construction already produces minimal SSA form, the algorithm > > isn't > > expected to remove any PHIs if run right after the construction. I even > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where > > the > > algorithm managed to remove exactly 1 PHI, which is weird). > > > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to > > remove at that point, but there still are some. > > That's interesting. Your placement at > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_phiopt, true /* early_p */); > + NEXT_PASS (pass_sccp); > > and > > NEXT_PASS (pass_tsan); > NEXT_PASS (pass_dse, true /* use DR analysis */); > NEXT_PASS (pass_dce); > + NEXT_PASS (pass_sccp); > > isn't immediately after the "best" existing pass we have to > remove dead PHIs which is pass_cd_dce. phiopt might leave > dead PHIs around and the second instance runs long after the > last CD-DCE.
Actually the last phiopt is run before last pass_cd_dce: NEXT_PASS (pass_dce, true /* update_address_taken_p */); /* After late DCE we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_sink_code, true /* unsplit edges */); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_fold_builtins); NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_store_merging); /* If DCE is not run before checking for uninitialized uses, we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c). However, this also causes us to misdiagnose cases that should be real warnings (e.g., testsuite/gcc.dg/pr18501.c). */ NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); Thanks, Andrew Pinski > > So I wonder if your pass just detects unnecessary PHIs we'd have > removed by other means and what survives until RTL expansion is > what we should count? > > Can you adjust your original early placement to right after > the cd-dce pass and for the late placement turn the dce pass > before it into cd-dce and re-do your measurements? > > > 500.perlbench_r > > Started with 43111 > > Ended with 42942 > > Removed PHI % .39201131961680313700 > > > > 502.gcc_r > > Started with 141392 > > Ended with 140455 > > Removed PHI % .66269661649881181400 > > > > 505.mcf_r > > Started with 482 > > Ended with 478 > > Removed PHI % .82987551867219917100 > > > > 523.xalancbmk_r > > Started with 136040 > > Ended with 135629 > > Removed PHI % .30211702440458688700 > > > > 531.deepsjeng_r > > Started with 2150 > > Ended with 2148 > > Removed PHI % .09302325581395348900 > > > > 541.leela_r > > Started with 4664 > > Ended with 4650 > > Removed PHI % .30017152658662092700 > > > > 557.xz_r > > Started with 43 > > Ended with 43 > > Removed PHI % 0 > > > > > Can the pass somehow be used as part of propagations like during value > > > numbering? > > > > I don't think that the pass could be used as a part of different > > optimizations > > since it works on the whole CFG (except for copy propagation as I noted in > > the > > RFC). I'm adding Honza into Cc. He'll have more insight into this. > > > > > Could the new file be called gimple-ssa-sccp.cc or something similar? > > > > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more > > appropriate? > > > > I'm thinking about naming the pass 'scc-copy' and the file > > 'tree-ssa-scc-copy.cc'. > > > > > Removing some PHIs is nice, but it would be also interesting to know what > > > are the effects on generated code size and/or performance. > > > And also if it has any effects on debug information coverage. > > > > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with > > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r. > > Here are the full results. What do you think? Should I run more benchmarks? > > Or > > benchmark multiple times? Or run the benchmarks on different machines?* > > > > 500.perlbench_r > > Without SCCP: 244.151807s > > With SCCP: 242.448438s > > -0.7025695913124297% > > > > 502.gcc_r > > Without SCCP: 211.029606s > > With SCCP: 211.614523s > > +0.27640683243653763% > > > > 505.mcf_r > > Without SCCP: 298.782621s > > With SCCP: 291.671468s > > -2.438069465197046% > > > > 523.xalancbmk_r > > Without SCCP: 189.940639s > > With SCCP: 189.876261s > > -0.03390523894928332% > > > > 531.deepsjeng_r > > Without SCCP: 250.63648s > > With SCCP: 250.988624s > > +0.1403027732444051% > > > > 541.leela_r > > Without SCCP: 346.066278s > > With SCCP: 339.692987s > > -1.8761915152519792% > > > > Regarding size: The pass doesn't seem to significantly reduce or increase > > the > > size of the result binary. The differences were at most ~0.1%. > > > > Regarding debug info coverage: I didn't notice any additional guality > > testcases > > failing after I applied the patch. *Is there any other way how I should > > check > > debug info coverage?* > > > > > > Filip K > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)