Hi,
Looks like both attached .csv files were deleted during the email delivery
procedure. Not sure what’s the reason for this.
Then I have to copy the text file here for you reference:
****benchmarks:
C 500.perlbench_r
C 502.gcc_r
C 505.mcf_r
C++ 520.omnetpp_r
C++ 523.xalancbmk_r
C 525.x264_r
C++ 531.deepsjeng_r
C++ 541.leela_r
C 557.xz_r
C++/C/Fortran 507.cactuBSSN_r
C++ 508.namd_r
C++ 510.parest_r
C++/C 511.povray_r
C 519.lbm_r
Fortran/C 521.wrf_r
C++/C 526.blender_r
Fortran/C 527.cam4_r
C 538.imagick_r
C 544.nab_r
***runtime overhead data and code size overhead data, I converted then to PDF
files, hopefully this time I can attach it with the email:
thanks.
Qing
> On Sep 3, 2020, at 9:29 AM, Qing Zhao via Gcc-patches
> <[email protected]> wrote:
>
> Hi,
>
> Per request, I collected runtime performance data and code size data with
> CPU2017 on a X86 platform.
>
> *** Machine info:
> model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> $ lscpu | grep NUMA
> NUMA node(s): 2
> NUMA node0 CPU(s): 0-21,44-65
> NUMA node1 CPU(s): 22-43,66-87
>
> ***CPU2017 benchmarks:
> all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks.
>
> ***Configures:
> Intrate and fprate, 22 copies.
>
> ***Compiler options:
> no : -g -O2 -march=native
> used_gpr_arg: no + -fzero-call-used-regs=used-gpr-arg
> used_arg: no + -fzero-call-used-regs=used-arg
> all_arg: no + -fzero-call-used-regs=all-arg
> used_gpr: no + -fzero-call-used-regs=used-gpr
> all_gpr: no + -fzero-call-used-regs=all-gpr
> used: no + -fzero-call-used-regs=used
> all: no + -fzero-call-used-regs=all
>
> ***each benchmark runs 3 times.
>
> ***runtime performance data:
> Please see the attached csv file
>
>
> From the data, we can see that:
> On average, all the options starting with “used_…” (i.e, only the registers
> that are used in the routine will be zeroed) have very low runtime overheads,
> at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
> If all the registers will be zeroed, the runtime overhead is bigger, all_arg
> is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on
> average.
> Looks like the overhead of zeroing vector registers is much bigger.
>
> For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the
> runtime overhead with this is very small.
>
> ***code size increase data:
>
> Please see the attached file
>
>
> From the data, we can see that:
> The code size impact in general is very small, the biggest is “all_arg”,
> which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.
>
> So, from the data collected, I think that the run-time overhead and code size
> increase from this option are very reasonable.
>
> Let me know you comments and opinions.
>
> thanks.
>
> Qing
>
>> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches
>> <[email protected]> wrote:
>>
>>
>>
>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool
>>> <[email protected]> wrote:
>>>
>>> Hi!
>>>
>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool
>>>>> <[email protected]> wrote:
>>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool
>>>>>>> <[email protected]> wrote:
>>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>>> execution time) would be useful. If it is so expensive that no one will
>>>>>>> use it, it helps security at most none at all :-(
>>>>>
>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>>> for them. And we (the GCC people) cannot know if it will be useful for
>>>>> enough users that it will be worth the effort for us. Which is why I
>>>>> keep hammering on this point.
>>>> I can collect some run-time overhead data on this, do you have a
>>>> recommendation on what test suite I can use
>>>> For this testing? (Is CPU2017 good enough)?
>>>
>>> I would use something more real-life, not 12 small pieces of code.
>>
>> There is some basic information about the benchmarks of CPU2017 in below
>> link:
>>
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$>
>>
>> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$<https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$>
>> >
>>
>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And
>> 526.blender_r is even larger than 502.gcc_r.
>> And there are several other quite big benchmarks as well (perlbench,
>> xalancbmk, parest, imagick, etc).
>>
>> thanks.
>>
>> Qing