Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

Qing Zhao via Gcc-patches Thu, 14 Jan 2021 13:16:37 -0800

Hi, 

More data on code size and compilation time with CPU2017:


********Compilation time data:   the numbers are the slowdown against the 
default “no”:

benchmarks               A/no   D/no
                        
500.perlbench_r 5.19%   1.95%
502.gcc_r               0.46%   -0.23%
505.mcf_r               0.00%   0.00%
520.omnetpp_r   0.85%   0.00%
523.xalancbmk_r 0.79%   -0.40%
525.x264_r              -4.48%  0.00%
531.deepsjeng_r 16.67%  16.67%
541.leela_r              0.00%   0.00%
557.xz_r                        0.00%    0.00%
                        
507.cactuBSSN_r 1.16%   0.58%
508.namd_r              9.62%   8.65%
510.parest_r            0.48%   1.19%
511.povray_r            3.70%   3.70%
519.lbm_r               0.00%   0.00%
521.wrf_r                       0.05%   0.02%
526.blender_r           0.33%   1.32%
527.cam4_r              -0.93%  -0.93%
538.imagick_r           1.32%   3.95%
544.nab_r               0.00%   0.00%

From the above data, looks like that the compilation time impact from 
implementation A and D are almost the same.

*******code size data: the numbers are the code size increase against the 
default “no”:
benchmarks              A/no            D/no
                        
500.perlbench_r 2.84%   0.34%
502.gcc_r               2.59%   0.35%
505.mcf_r               3.55%   0.39%
520.omnetpp_r   0.54%   0.03%
523.xalancbmk_r 0.36%    0.39%
525.x264_r              1.39%   0.13%
531.deepsjeng_r 2.15%   -1.12%
541.leela_r             0.50%   -0.20%
557.xz_r                        0.31%   0.13%
                        
507.cactuBSSN_r 5.00%   -0.01%
508.namd_r              3.64%   -0.07%
510.parest_r            1.12%   0.33%
511.povray_r            4.18%   1.16%
519.lbm_r               8.83%   6.44%
521.wrf_r                       0.08%   0.02%
526.blender_r           1.63%   0.45%
527.cam4_r               0.16%  0.06%
538.imagick_r           3.18%   -0.80%
544.nab_r               5.76%   -1.11%
Avg                             2.52%   0.36%

From the above data, the implementation D is always better than A, it’s a 
surprising to me, not sure what’s the reason for this.

********stack usage data, I added -fstack-usage to the compilation line when 
compiling CPU2017 benchmarks. And all the *.su files were generated for each of 
the modules.
Since there a lot of such files, and the stack size information are embedded in 
each of the files.  I just picked up one benchmark 511.povray to check. Which 
is the one that 
has the most runtime overhead when adding initialization (both A and D). 

I identified all the *.su files that are different between A and D and do a 
diff on those *.su files, and looks like that the stack size is much higher 
with D than that with A, for example:

$ diff build_base_auto_init.D.0000/bbox.su build_base_auto_init.A.0000/bbox.su
5c5
< bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, 
long int*, long int, long int)  160     static
---
> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, 
> long int*, long int, long int)  96      static

$ diff build_base_auto_init.D.0000/image.su build_base_auto_init.A.0000/image.su
9c9
< image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)   624     
static
---
> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)   272     
> static
….

Looks like that implementation D has more stack size impact than A. 

Do you have any insight on what the reason for this?

Let me know if you have any comments and suggestions.

thanks.

Qing
> On Jan 13, 2021, at 1:39 AM, Richard Biener <rguent...@suse.de> wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>> .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with 
>> “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work 
>> for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decision on 
>> implementing approach D. 
> 
> Did you do any analysis on how stack usage and code size are changed 
> with approach D?  How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
> 
> Richard.
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>> 
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>>> <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> Hi,
>>> 
>>> This is an update for our previous discussion. 
>>> 
>>> 1. I implemented the following two different implementations in the latest 
>>> upstream gcc:
>>> 
>>> A. Adding real initialization during gimplification, not maintain the 
>>> uninitialized warnings.
>>> 
>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>>> .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with 
>>> “.DEFFERED_INIT”.
>>> 
>>> Note, in this initial implementation,
>>>     ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
>>> -ftrivial-auto-var-init=pattern 
>>>        is not done yet.  Therefore, the performance data is only about 
>>> -ftrivial-auto-var-init=zero. 
>>> 
>>>     ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
>>> choose implementation A or D for 
>>>        runtime performance study.
>>>     ** I didn’t finish the uninitialized warnings maintenance work for D. 
>>> (That might take more time than I expected). 
>>> 
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
>>> for the following 3 cases:
>>> 
>>> no: default. (-g -O2 -march=native )
>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>> 
>>> And then compute the slowdown data for both A and D as following:
>>> 
>>> benchmarks          A / no  D /no
>>> 
>>> 500.perlbench_r     1.25%   1.25%
>>> 502.gcc_r           0.68%   1.80%
>>> 505.mcf_r           0.68%   0.14%
>>> 520.omnetpp_r       4.83%   4.68%
>>> 523.xalancbmk_r     0.18%   1.96%
>>> 525.x264_r          1.55%   2.07%
>>> 531.deepsjeng_      11.57%  11.85%
>>> 541.leela_r         0.64%   0.80%
>>> 557.xz_                      -0.41% -0.41%
>>> 
>>> 507.cactuBSSN_r     0.44%   0.44%
>>> 508.namd_r          0.34%   0.34%
>>> 510.parest_r                0.17%   0.25%
>>> 511.povray_r                56.57%  57.27%
>>> 519.lbm_r           0.00%   0.00%
>>> 521.wrf_r                    -0.28% -0.37%
>>> 526.blender_r               16.96%  17.71%
>>> 527.cam4_r          0.70%   0.53%
>>> 538.imagick_r               2.40%   2.40%
>>> 544.nab_r           0.00%   -0.65%
>>> 
>>> avg                         5.17%   5.37%
>>> 
>>> From the above data, we can see that in general, the runtime performance 
>>> slowdown for 
>>> implementation A and D are similar for individual benchmarks.
>>> 
>>> There are several benchmarks that have significant slowdown with the new 
>>> added initialization for both
>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I 
>>> will try to study a little bit
>>> more on what kind of new initializations introduced such slowdown. 
>>> 
>>> From the current study so far, I think that approach D should be good 
>>> enough for our final implementation. 
>>> So, I will try to finish approach D with the following remaining work
>>> 
>>>     ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>     ** complete the implementation of uninitialized warnings maintenance 
>>> work for D. 
>>> 
>>> 
>>> Let me know if you have any comments and suggestions on my current and 
>>> future work.
>>> 
>>> Thanks a lot for your help.
>>> 
>>> Qing
>>> 
>>>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>>>> The following are the approaches I will implement and compare:
>>>> 
>>>> Our final goal is to keep the uninitialized warning and minimize the 
>>>> run-time performance cost.
>>>> 
>>>> A. Adding real initialization during gimplification, not maintain the 
>>>> uninitialized warnings.
>>>> B. Adding real initialization during gimplification, marking them with 
>>>> “artificial_init”. 
>>>>   Adjusting uninitialized pass, maintaining the annotation, making sure 
>>>> the real init not
>>>>   Deleted from the fake init. 
>>>> C.  Marking the DECL for an uninitialized auto variable as 
>>>> “no_explicit_init” during gimplification,
>>>>    maintain this “no_explicit_init” bit till after 
>>>> pass_late_warn_uninitialized, or till pass_expand, 
>>>>    add real initialization for all DECLs that are marked with 
>>>> “no_explicit_init”.
>>>> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT 
>>>> during expand to
>>>>   real initialization. Adjusting uninitialized pass with the new refs with 
>>>> “.DEFFERED_INIT”.
>>>> 
>>>> 
>>>> In the above, approach A will be the one that have the minimum run-time 
>>>> cost, will be the base for the performance
>>>> comparison. 
>>>> 
>>>> I will implement approach D then, this one is expected to have the most 
>>>> run-time overhead among the above list, but
>>>> Implementation should be the cleanest among B, C, D. Let’s see how much 
>>>> more performance overhead this approach
>>>> will be. If the data is good, maybe we can avoid the effort to implement 
>>>> B, and C. 
>>>> 
>>>> If the performance of D is not good, I will implement B or C at that time.
>>>> 
>>>> Let me know if you have any comment or suggestions.
>>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

Reply via email to