On Wed, 13 Jan 2021, Qing Zhao wrote: > > > > On Jan 13, 2021, at 1:39 AM, Richard Biener <rguent...@suse.de> wrote: > > > > On Tue, 12 Jan 2021, Qing Zhao wrote: > > > >> Hi, > >> > >> Just check in to see whether you have any comments and suggestions on this: > >> > >> FYI, I have been continue with Approach D implementation since last week: > >> > >> D. Adding calls to .DEFFERED_INIT during gimplification, expand the > >> .DEFFERED_INIT during expand to > >> real initialization. Adjusting uninitialized pass with the new refs with > >> “.DEFFERED_INIT”. > >> > >> For the remaining work of Approach D: > >> > >> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >> ** complete the implementation of uninitialized warnings maintenance work > >> for D. > >> > >> I have completed the uninitialized warnings maintenance work for D. > >> And finished partial of the -ftrivial-auto-var-init=pattern > >> implementation. > >> > >> The following are remaining work of Approach D: > >> > >> ** -ftrivial-auto-var-init=pattern for VLA; > >> **add a new attribute for variable: > >> __attribute((uninitialized) > >> the marked variable is uninitialized intentionaly for performance purpose. > >> ** adding complete testing cases; > >> > >> > >> Please let me know if you have any objection on my current decision on > >> implementing approach D. > > > > Did you do any analysis on how stack usage and code size are changed > > with approach D? > > I did the code size change comparison (I will provide the data in another > email). And with this data, D works better than A in general. (This is > surprise to me actually). > > But not the stack usage. Not sure how to collect the stack usage data, > do you have any suggestion on this?
There is -fstack-usage you could use, then of course watching the stack segment at runtime. I'm mostly concerned about stack-limited "processes" such as the linux kernel which I think is a primary target of your work. Richard. > > > How does compile-time behave (we could gobble up > > lots of .DEFERRED_INIT calls I guess)? > I can collect this data too and report it later. > > Thanks. > > Qing > > > > Richard. > > > >> Thanks a lot for your help. > >> > >> Qing > >> > >> > >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches > >>> <gcc-patches@gcc.gnu.org> wrote: > >>> > >>> Hi, > >>> > >>> This is an update for our previous discussion. > >>> > >>> 1. I implemented the following two different implementations in the > >>> latest upstream gcc: > >>> > >>> A. Adding real initialization during gimplification, not maintain the > >>> uninitialized warnings. > >>> > >>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the > >>> .DEFFERED_INIT during expand to > >>> real initialization. Adjusting uninitialized pass with the new refs with > >>> “.DEFFERED_INIT”. > >>> > >>> Note, in this initial implementation, > >>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of > >>> -ftrivial-auto-var-init=pattern > >>> is not done yet. Therefore, the performance data is only about > >>> -ftrivial-auto-var-init=zero. > >>> > >>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to > >>> choose implementation A or D for > >>> runtime performance study. > >>> ** I didn’t finish the uninitialized warnings maintenance work for D. > >>> (That might take more time than I expected). > >>> > >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new > >>> gcc for the following 3 cases: > >>> > >>> no: default. (-g -O2 -march=native ) > >>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > >>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > >>> > >>> And then compute the slowdown data for both A and D as following: > >>> > >>> benchmarks A / no D /no > >>> > >>> 500.perlbench_r 1.25% 1.25% > >>> 502.gcc_r 0.68% 1.80% > >>> 505.mcf_r 0.68% 0.14% > >>> 520.omnetpp_r 4.83% 4.68% > >>> 523.xalancbmk_r 0.18% 1.96% > >>> 525.x264_r 1.55% 2.07% > >>> 531.deepsjeng_ 11.57% 11.85% > >>> 541.leela_r 0.64% 0.80% > >>> 557.xz_ -0.41% -0.41% > >>> > >>> 507.cactuBSSN_r 0.44% 0.44% > >>> 508.namd_r 0.34% 0.34% > >>> 510.parest_r 0.17% 0.25% > >>> 511.povray_r 56.57% 57.27% > >>> 519.lbm_r 0.00% 0.00% > >>> 521.wrf_r -0.28% -0.37% > >>> 526.blender_r 16.96% 17.71% > >>> 527.cam4_r 0.70% 0.53% > >>> 538.imagick_r 2.40% 2.40% > >>> 544.nab_r 0.00% -0.65% > >>> > >>> avg 5.17% 5.37% > >>> > >>> From the above data, we can see that in general, the runtime performance > >>> slowdown for > >>> implementation A and D are similar for individual benchmarks. > >>> > >>> There are several benchmarks that have significant slowdown with the new > >>> added initialization for both > >>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I > >>> will try to study a little bit > >>> more on what kind of new initializations introduced such slowdown. > >>> > >>> From the current study so far, I think that approach D should be good > >>> enough for our final implementation. > >>> So, I will try to finish approach D with the following remaining work > >>> > >>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>> ** complete the implementation of uninitialized warnings maintenance > >>> work for D. > >>> > >>> > >>> Let me know if you have any comments and suggestions on my current and > >>> future work. > >>> > >>> Thanks a lot for your help. > >>> > >>> Qing > >>> > >>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches > >>>> <gcc-patches@gcc.gnu.org> wrote: > >>>> > >>>> The following are the approaches I will implement and compare: > >>>> > >>>> Our final goal is to keep the uninitialized warning and minimize the > >>>> run-time performance cost. > >>>> > >>>> A. Adding real initialization during gimplification, not maintain the > >>>> uninitialized warnings. > >>>> B. Adding real initialization during gimplification, marking them with > >>>> “artificial_init”. > >>>> Adjusting uninitialized pass, maintaining the annotation, making sure > >>>> the real init not > >>>> Deleted from the fake init. > >>>> C. Marking the DECL for an uninitialized auto variable as > >>>> “no_explicit_init” during gimplification, > >>>> maintain this “no_explicit_init” bit till after > >>>> pass_late_warn_uninitialized, or till pass_expand, > >>>> add real initialization for all DECLs that are marked with > >>>> “no_explicit_init”. > >>>> D. Adding .DEFFERED_INIT during gimplification, expand the > >>>> .DEFFERED_INIT during expand to > >>>> real initialization. Adjusting uninitialized pass with the new refs > >>>> with “.DEFFERED_INIT”. > >>>> > >>>> > >>>> In the above, approach A will be the one that have the minimum run-time > >>>> cost, will be the base for the performance > >>>> comparison. > >>>> > >>>> I will implement approach D then, this one is expected to have the most > >>>> run-time overhead among the above list, but > >>>> Implementation should be the cleanest among B, C, D. Let’s see how much > >>>> more performance overhead this approach > >>>> will be. If the data is good, maybe we can avoid the effort to implement > >>>> B, and C. > >>>> > >>>> If the performance of D is not good, I will implement B or C at that > >>>> time. > >>>> > >>>> Let me know if you have any comment or suggestions. > >>>> > >>>> Thanks. > >>>> > >>>> Qing > >>> > >> > >> > > > > -- > > Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de>> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)