Hi, This is an update for our previous discussion.
1. I implemented the following two different implementations in the latest upstream gcc: A. Adding real initialization during gimplification, not maintain the uninitialized warnings. D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. Note, in this initial implementation, ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of -ftrivial-auto-var-init=pattern is not done yet. Therefore, the performance data is only about -ftrivial-auto-var-init=zero. ** I added an temporary option -fauto-var-init-approach=A|B|C|D to choose implementation A or D for runtime performance study. ** I didn’t finish the uninitialized warnings maintenance work for D. (That might take more time than I expected). 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for the following 3 cases: no: default. (-g -O2 -march=native ) A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D And then compute the slowdown data for both A and D as following: benchmarks A / no D /no 500.perlbench_r 1.25% 1.25% 502.gcc_r 0.68% 1.80% 505.mcf_r 0.68% 0.14% 520.omnetpp_r 4.83% 4.68% 523.xalancbmk_r 0.18% 1.96% 525.x264_r 1.55% 2.07% 531.deepsjeng_ 11.57% 11.85% 541.leela_r 0.64% 0.80% 557.xz_ -0.41% -0.41% 507.cactuBSSN_r 0.44% 0.44% 508.namd_r 0.34% 0.34% 510.parest_r 0.17% 0.25% 511.povray_r 56.57% 57.27% 519.lbm_r 0.00% 0.00% 521.wrf_r -0.28% -0.37% 526.blender_r 16.96% 17.71% 527.cam4_r 0.70% 0.53% 538.imagick_r 2.40% 2.40% 544.nab_r 0.00% -0.65% avg 5.17% 5.37% From the above data, we can see that in general, the runtime performance slowdown for implementation A and D are similar for individual benchmarks. There are several benchmarks that have significant slowdown with the new added initialization for both A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will try to study a little bit more on what kind of new initializations introduced such slowdown. From the current study so far, I think that approach D should be good enough for our final implementation. So, I will try to finish approach D with the following remaining work ** complete the implementation of -ftrivial-auto-var-init=pattern; ** complete the implementation of uninitialized warnings maintenance work for D. Let me know if you have any comments and suggestions on my current and future work. Thanks a lot for your help. Qing > On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > The following are the approaches I will implement and compare: > > Our final goal is to keep the uninitialized warning and minimize the run-time > performance cost. > > A. Adding real initialization during gimplification, not maintain the > uninitialized warnings. > B. Adding real initialization during gimplification, marking them with > “artificial_init”. > Adjusting uninitialized pass, maintaining the annotation, making sure the > real init not > Deleted from the fake init. > C. Marking the DECL for an uninitialized auto variable as “no_explicit_init” > during gimplification, > maintain this “no_explicit_init” bit till after > pass_late_warn_uninitialized, or till pass_expand, > add real initialization for all DECLs that are marked with > “no_explicit_init”. > D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT > during expand to > real initialization. Adjusting uninitialized pass with the new refs with > “.DEFFERED_INIT”. > > > In the above, approach A will be the one that have the minimum run-time cost, > will be the base for the performance > comparison. > > I will implement approach D then, this one is expected to have the most > run-time overhead among the above list, but > Implementation should be the cleanest among B, C, D. Let’s see how much more > performance overhead this approach > will be. If the data is good, maybe we can avoid the effort to implement B, > and C. > > If the performance of D is not good, I will implement B or C at that time. > > Let me know if you have any comment or suggestions. > > Thanks. > > Qing