On Wed, 13 Jan 2021, Qing Zhao wrote: > > > > On Jan 13, 2021, at 9:10 AM, Richard Biener <rguent...@suse.de> wrote: > > > > On Wed, 13 Jan 2021, Qing Zhao wrote: > > > >> > >> > >>> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguent...@suse.de> wrote: > >>> > >>> On Tue, 12 Jan 2021, Qing Zhao wrote: > >>> > >>>> Hi, > >>>> > >>>> Just check in to see whether you have any comments and suggestions on > >>>> this: > >>>> > >>>> FYI, I have been continue with Approach D implementation since last week: > >>>> > >>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the > >>>> .DEFFERED_INIT during expand to > >>>> real initialization. Adjusting uninitialized pass with the new refs with > >>>> “.DEFFERED_INIT”. > >>>> > >>>> For the remaining work of Approach D: > >>>> > >>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>>> ** complete the implementation of uninitialized warnings maintenance > >>>> work for D. > >>>> > >>>> I have completed the uninitialized warnings maintenance work for D. > >>>> And finished partial of the -ftrivial-auto-var-init=pattern > >>>> implementation. > >>>> > >>>> The following are remaining work of Approach D: > >>>> > >>>> ** -ftrivial-auto-var-init=pattern for VLA; > >>>> **add a new attribute for variable: > >>>> __attribute((uninitialized) > >>>> the marked variable is uninitialized intentionaly for performance > >>>> purpose. > >>>> ** adding complete testing cases; > >>>> > >>>> > >>>> Please let me know if you have any objection on my current decision on > >>>> implementing approach D. > >>> > >>> Did you do any analysis on how stack usage and code size are changed > >>> with approach D? > >> > >> I did the code size change comparison (I will provide the data in another > >> email). And with this data, D works better than A in general. (This is > >> surprise to me actually). > >> > >> But not the stack usage. Not sure how to collect the stack usage data, > >> do you have any suggestion on this? > > > > There is -fstack-usage you could use, then of course watching > > the stack segment at runtime. > > I can do this for CPU2017 to collect the stack usage data and report back. > > > I'm mostly concerned about > > stack-limited "processes" such as the linux kernel which I think > > is a primary target of your work. > > I don’t have any experience on building linux kernel. > Do we have to collect data for linux kernel at this time? Is CPU2017 data not > enough?
Well, it depends on the desired target. The linux kernel has a 8kb hard stack limit for kernel threads on x86_64 (IIRC). You don't have to do anything, it was just a suggestion. For normal program stack usage is probably the least important problem. Richard. > Qing > > > > Richard. > > > >> > >>> How does compile-time behave (we could gobble up > >>> lots of .DEFERRED_INIT calls I guess)? > >> I can collect this data too and report it later. > >> > >> Thanks. > >> > >> Qing > >>> > >>> Richard. > >>> > >>>> Thanks a lot for your help. > >>>> > >>>> Qing > >>>> > >>>> > >>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches > >>>>> <gcc-patches@gcc.gnu.org> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> This is an update for our previous discussion. > >>>>> > >>>>> 1. I implemented the following two different implementations in the > >>>>> latest upstream gcc: > >>>>> > >>>>> A. Adding real initialization during gimplification, not maintain the > >>>>> uninitialized warnings. > >>>>> > >>>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the > >>>>> .DEFFERED_INIT during expand to > >>>>> real initialization. Adjusting uninitialized pass with the new refs > >>>>> with “.DEFFERED_INIT”. > >>>>> > >>>>> Note, in this initial implementation, > >>>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the > >>>>> implementation of -ftrivial-auto-var-init=pattern > >>>>> is not done yet. Therefore, the performance data is only > >>>>> about -ftrivial-auto-var-init=zero. > >>>>> > >>>>> ** I added an temporary option > >>>>> -fauto-var-init-approach=A|B|C|D to choose implementation A or D for > >>>>> runtime performance study. > >>>>> ** I didn’t finish the uninitialized warnings maintenance work > >>>>> for D. (That might take more time than I expected). > >>>>> > >>>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new > >>>>> gcc for the following 3 cases: > >>>>> > >>>>> no: default. (-g -O2 -march=native ) > >>>>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > >>>>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > >>>>> > >>>>> And then compute the slowdown data for both A and D as following: > >>>>> > >>>>> benchmarks A / no D /no > >>>>> > >>>>> 500.perlbench_r 1.25% 1.25% > >>>>> 502.gcc_r 0.68% 1.80% > >>>>> 505.mcf_r 0.68% 0.14% > >>>>> 520.omnetpp_r 4.83% 4.68% > >>>>> 523.xalancbmk_r 0.18% 1.96% > >>>>> 525.x264_r 1.55% 2.07% > >>>>> 531.deepsjeng_ 11.57% 11.85% > >>>>> 541.leela_r 0.64% 0.80% > >>>>> 557.xz_ -0.41% -0.41% > >>>>> > >>>>> 507.cactuBSSN_r 0.44% 0.44% > >>>>> 508.namd_r 0.34% 0.34% > >>>>> 510.parest_r 0.17% 0.25% > >>>>> 511.povray_r 56.57% 57.27% > >>>>> 519.lbm_r 0.00% 0.00% > >>>>> 521.wrf_r -0.28% -0.37% > >>>>> 526.blender_r 16.96% 17.71% > >>>>> 527.cam4_r 0.70% 0.53% > >>>>> 538.imagick_r 2.40% 2.40% > >>>>> 544.nab_r 0.00% -0.65% > >>>>> > >>>>> avg 5.17% 5.37% > >>>>> > >>>>> From the above data, we can see that in general, the runtime > >>>>> performance slowdown for > >>>>> implementation A and D are similar for individual benchmarks. > >>>>> > >>>>> There are several benchmarks that have significant slowdown with the > >>>>> new added initialization for both > >>>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, > >>>>> I will try to study a little bit > >>>>> more on what kind of new initializations introduced such slowdown. > >>>>> > >>>>> From the current study so far, I think that approach D should be good > >>>>> enough for our final implementation. > >>>>> So, I will try to finish approach D with the following remaining work > >>>>> > >>>>> ** complete the implementation of -ftrivial-auto-var-init=pattern; > >>>>> ** complete the implementation of uninitialized warnings maintenance > >>>>> work for D. > >>>>> > >>>>> > >>>>> Let me know if you have any comments and suggestions on my current and > >>>>> future work. > >>>>> > >>>>> Thanks a lot for your help. > >>>>> > >>>>> Qing > >>>>> > >>>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches > >>>>>> <gcc-patches@gcc.gnu.org> wrote: > >>>>>> > >>>>>> The following are the approaches I will implement and compare: > >>>>>> > >>>>>> Our final goal is to keep the uninitialized warning and minimize the > >>>>>> run-time performance cost. > >>>>>> > >>>>>> A. Adding real initialization during gimplification, not maintain the > >>>>>> uninitialized warnings. > >>>>>> B. Adding real initialization during gimplification, marking them with > >>>>>> “artificial_init”. > >>>>>> Adjusting uninitialized pass, maintaining the annotation, making sure > >>>>>> the real init not > >>>>>> Deleted from the fake init. > >>>>>> C. Marking the DECL for an uninitialized auto variable as > >>>>>> “no_explicit_init” during gimplification, > >>>>>> maintain this “no_explicit_init” bit till after > >>>>>> pass_late_warn_uninitialized, or till pass_expand, > >>>>>> add real initialization for all DECLs that are marked with > >>>>>> “no_explicit_init”. > >>>>>> D. Adding .DEFFERED_INIT during gimplification, expand the > >>>>>> .DEFFERED_INIT during expand to > >>>>>> real initialization. Adjusting uninitialized pass with the new refs > >>>>>> with “.DEFFERED_INIT”. > >>>>>> > >>>>>> > >>>>>> In the above, approach A will be the one that have the minimum > >>>>>> run-time cost, will be the base for the performance > >>>>>> comparison. > >>>>>> > >>>>>> I will implement approach D then, this one is expected to have the > >>>>>> most run-time overhead among the above list, but > >>>>>> Implementation should be the cleanest among B, C, D. Let’s see how > >>>>>> much more performance overhead this approach > >>>>>> will be. If the data is good, maybe we can avoid the effort to > >>>>>> implement B, and C. > >>>>>> > >>>>>> If the performance of D is not good, I will implement B or C at that > >>>>>> time. > >>>>>> > >>>>>> Let me know if you have any comment or suggestions. > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> Qing > >>>>> > >>>> > >>>> > >>> > >>> -- > >>> Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de> > >>> <mailto:rguent...@suse.de <mailto:rguent...@suse.de>>> > >>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > >>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > >> > >> > > > > -- > > Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de>> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)