> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguent...@suse.de> wrote:
>
> On Tue, 12 Jan 2021, Qing Zhao wrote:
>
>> Hi,
>>
>> Just check in to see whether you have any comments and suggestions on this:
>>
>> FYI, I have been continue with Approach D implementation since last week:
>>
>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the
>> .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with
>> “.DEFFERED_INIT”.
>>
>> For the remaining work of Approach D:
>>
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work
>> for D.
>>
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation.
>>
>> The following are remaining work of Approach D:
>>
>> ** -ftrivial-auto-var-init=pattern for VLA;
>> **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> ** adding complete testing cases;
>>
>>
>> Please let me know if you have any objection on my current decision on
>> implementing approach D.
>
> Did you do any analysis on how stack usage and code size are changed
> with approach D?
I did the code size change comparison (I will provide the data in another
email). And with this data, D works better than A in general. (This is surprise
to me actually).
But not the stack usage. Not sure how to collect the stack usage data, do you
have any suggestion on this?
> How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
I can collect this data too and report it later.
Thanks.
Qing
>
> Richard.
>
>> Thanks a lot for your help.
>>
>> Qing
>>
>>
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>> Hi,
>>>
>>> This is an update for our previous discussion.
>>>
>>> 1. I implemented the following two different implementations in the latest
>>> upstream gcc:
>>>
>>> A. Adding real initialization during gimplification, not maintain the
>>> uninitialized warnings.
>>>
>>> D. Adding calls to .DEFFERED_INIT during gimplification, expand the
>>> .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with
>>> “.DEFFERED_INIT”.
>>>
>>> Note, in this initial implementation,
>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of
>>> -ftrivial-auto-var-init=pattern
>>> is not done yet. Therefore, the performance data is only about
>>> -ftrivial-auto-var-init=zero.
>>>
>>> ** I added an temporary option -fauto-var-init-approach=A|B|C|D to
>>> choose implementation A or D for
>>> runtime performance study.
>>> ** I didn’t finish the uninitialized warnings maintenance work for D.
>>> (That might take more time than I expected).
>>>
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc
>>> for the following 3 cases:
>>>
>>> no: default. (-g -O2 -march=native )
>>> A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A
>>> D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D
>>>
>>> And then compute the slowdown data for both A and D as following:
>>>
>>> benchmarks A / no D /no
>>>
>>> 500.perlbench_r 1.25% 1.25%
>>> 502.gcc_r 0.68% 1.80%
>>> 505.mcf_r 0.68% 0.14%
>>> 520.omnetpp_r 4.83% 4.68%
>>> 523.xalancbmk_r 0.18% 1.96%
>>> 525.x264_r 1.55% 2.07%
>>> 531.deepsjeng_ 11.57% 11.85%
>>> 541.leela_r 0.64% 0.80%
>>> 557.xz_ -0.41% -0.41%
>>>
>>> 507.cactuBSSN_r 0.44% 0.44%
>>> 508.namd_r 0.34% 0.34%
>>> 510.parest_r 0.17% 0.25%
>>> 511.povray_r 56.57% 57.27%
>>> 519.lbm_r 0.00% 0.00%
>>> 521.wrf_r -0.28% -0.37%
>>> 526.blender_r 16.96% 17.71%
>>> 527.cam4_r 0.70% 0.53%
>>> 538.imagick_r 2.40% 2.40%
>>> 544.nab_r 0.00% -0.65%
>>>
>>> avg 5.17% 5.37%
>>>
>>> From the above data, we can see that in general, the runtime performance
>>> slowdown for
>>> implementation A and D are similar for individual benchmarks.
>>>
>>> There are several benchmarks that have significant slowdown with the new
>>> added initialization for both
>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I
>>> will try to study a little bit
>>> more on what kind of new initializations introduced such slowdown.
>>>
>>> From the current study so far, I think that approach D should be good
>>> enough for our final implementation.
>>> So, I will try to finish approach D with the following remaining work
>>>
>>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>> ** complete the implementation of uninitialized warnings maintenance
>>> work for D.
>>>
>>>
>>> Let me know if you have any comments and suggestions on my current and
>>> future work.
>>>
>>> Thanks a lot for your help.
>>>
>>> Qing
>>>
>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> The following are the approaches I will implement and compare:
>>>>
>>>> Our final goal is to keep the uninitialized warning and minimize the
>>>> run-time performance cost.
>>>>
>>>> A. Adding real initialization during gimplification, not maintain the
>>>> uninitialized warnings.
>>>> B. Adding real initialization during gimplification, marking them with
>>>> “artificial_init”.
>>>> Adjusting uninitialized pass, maintaining the annotation, making sure
>>>> the real init not
>>>> Deleted from the fake init.
>>>> C. Marking the DECL for an uninitialized auto variable as
>>>> “no_explicit_init” during gimplification,
>>>> maintain this “no_explicit_init” bit till after
>>>> pass_late_warn_uninitialized, or till pass_expand,
>>>> add real initialization for all DECLs that are marked with
>>>> “no_explicit_init”.
>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT
>>>> during expand to
>>>> real initialization. Adjusting uninitialized pass with the new refs with
>>>> “.DEFFERED_INIT”.
>>>>
>>>>
>>>> In the above, approach A will be the one that have the minimum run-time
>>>> cost, will be the base for the performance
>>>> comparison.
>>>>
>>>> I will implement approach D then, this one is expected to have the most
>>>> run-time overhead among the above list, but
>>>> Implementation should be the cleanest among B, C, D. Let’s see how much
>>>> more performance overhead this approach
>>>> will be. If the data is good, maybe we can avoid the effort to implement
>>>> B, and C.
>>>>
>>>> If the performance of D is not good, I will implement B or C at that time.
>>>>
>>>> Let me know if you have any comment or suggestions.
>>>>
>>>> Thanks.
>>>>
>>>> Qing
>>>
>>
>>
>
> --
> Richard Biener <rguent...@suse.de <mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)