On Thu, Aug 31, 2023 at 5:15 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, 31 Aug 2023, Filip Kastl wrote:
>
> > > The most obvious places would be right after SSA construction and before 
> > > RTL expansion.
> > > Can you provide measurements for those positions?
> >
> > The algorithm should only remove PHIs that break SSA form minimality. Since
> > GCC's SSA construction already produces minimal SSA form, the algorithm 
> > isn't
> > expected to remove any PHIs if run right after the construction. I even
> > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where 
> > the
> > algorithm managed to remove exactly 1 PHI, which is weird).
> >
> > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > remove at that point, but there still are some.
>
> That's interesting.  Your placement at
>
>           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>           NEXT_PASS (pass_phiopt, true /* early_p */);
> +         NEXT_PASS (pass_sccp);
>
> and
>
>        NEXT_PASS (pass_tsan);
>        NEXT_PASS (pass_dse, true /* use DR analysis */);
>        NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_sccp);
>
> isn't immediately after the "best" existing pass we have to
> remove dead PHIs which is pass_cd_dce.  phiopt might leave
> dead PHIs around and the second instance runs long after the
> last CD-DCE.

Actually the last phiopt is run before last pass_cd_dce:
      NEXT_PASS (pass_dce, true /* update_address_taken_p */);
      /* After late DCE we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_forwprop);
      NEXT_PASS (pass_sink_code, true /* unsplit edges */);
      NEXT_PASS (pass_phiopt, false /* early_p */);
      NEXT_PASS (pass_fold_builtins);
      NEXT_PASS (pass_optimize_widening_mul);
      NEXT_PASS (pass_store_merging);
      /* If DCE is not run before checking for uninitialized uses,
         we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
         However, this also causes us to misdiagnose cases that should be
         real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
      NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

Thanks,
Andrew Pinski


>
> So I wonder if your pass just detects unnecessary PHIs we'd have
> removed by other means and what survives until RTL expansion is
> what we should count?



>
> Can you adjust your original early placement to right after
> the cd-dce pass and for the late placement turn the dce pass
> before it into cd-dce and re-do your measurements?
>
> > 500.perlbench_r
> > Started with 43111
> > Ended with 42942
> > Removed PHI % .39201131961680313700
> >
> > 502.gcc_r
> > Started with 141392
> > Ended with 140455
> > Removed PHI % .66269661649881181400
> >
> > 505.mcf_r
> > Started with 482
> > Ended with 478
> > Removed PHI % .82987551867219917100
> >
> > 523.xalancbmk_r
> > Started with 136040
> > Ended with 135629
> > Removed PHI % .30211702440458688700
> >
> > 531.deepsjeng_r
> > Started with 2150
> > Ended with 2148
> > Removed PHI % .09302325581395348900
> >
> > 541.leela_r
> > Started with 4664
> > Ended with 4650
> > Removed PHI % .30017152658662092700
> >
> > 557.xz_r
> > Started with 43
> > Ended with 43
> > Removed PHI % 0
> >
> > > Can the pass somehow be used as part of propagations like during value 
> > > numbering?
> >
> > I don't think that the pass could be used as a part of different 
> > optimizations
> > since it works on the whole CFG (except for copy propagation as I noted in 
> > the
> > RFC). I'm adding Honza into Cc. He'll have more insight into this.
> >
> > > Could the new file be called gimple-ssa-sccp.cc or something similar?
> >
> > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> > appropriate?
> >
> > I'm thinking about naming the pass 'scc-copy' and the file
> > 'tree-ssa-scc-copy.cc'.
> >
> > > Removing some PHIs is nice, but it would be also interesting to know what
> > > are the effects on generated code size and/or performance.
> > > And also if it has any effects on debug information coverage.
> >
> > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> > Here are the full results. What do you think? Should I run more benchmarks? 
> > Or
> > benchmark multiple times? Or run the benchmarks on different machines?*
> >
> > 500.perlbench_r
> > Without SCCP: 244.151807s
> > With SCCP: 242.448438s
> > -0.7025695913124297%
> >
> > 502.gcc_r
> > Without SCCP: 211.029606s
> > With SCCP: 211.614523s
> > +0.27640683243653763%
> >
> > 505.mcf_r
> > Without SCCP: 298.782621s
> > With SCCP: 291.671468s
> > -2.438069465197046%
> >
> > 523.xalancbmk_r
> > Without SCCP: 189.940639s
> > With SCCP: 189.876261s
> > -0.03390523894928332%
> >
> > 531.deepsjeng_r
> > Without SCCP: 250.63648s
> > With SCCP: 250.988624s
> > +0.1403027732444051%
> >
> > 541.leela_r
> > Without SCCP: 346.066278s
> > With SCCP: 339.692987s
> > -1.8761915152519792%
> >
> > Regarding size: The pass doesn't seem to significantly reduce or increase 
> > the
> > size of the result binary. The differences were at most ~0.1%.
> >
> > Regarding debug info coverage: I didn't notice any additional guality 
> > testcases
> > failing after I applied the patch. *Is there any other way how I should 
> > check
> > debug info coverage?*
> >
> >
> > Filip K
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to