On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote:
>> With -mavx, for
>>
>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
>> extern float f;
>> extern double d;
>> extern int i;
>>
>> void
>> foo (void)
>> {
>>   d = f;
>>   f = i;
>> }
>>
>> we need to generate
>>
>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>>         ...
>>         vcvtss2sd       f(%rip), %xmmN, %xmmX
>>         ...
>>         vcvtsi2ss       i(%rip), %xmmN, %xmmY
>>
>> to avoid partial XMM register stall.  This patch adds a pass to generate
>> a single
>>
>>         vxorps          %xmmN, %xmmN, %xmmN
>>
>> at function entry, which is shared by all SF and DF conversions, instead
>> of generating one
>>
>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>>
>> for each SF/DF conversion.
>>
>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>>
>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>>
>> are
>>
>> 1. On Broadwell server:
>>
>> 500.perlbench_r (-0.82%)
>> 502.gcc_r (0.73%)
>> 505.mcf_r (-0.24%)
>> 520.omnetpp_r (-2.22%)
>> 523.xalancbmk_r (-1.47%)
>> 525.x264_r (0.31%)
>> 531.deepsjeng_r (0.27%)
>> 541.leela_r (0.85%)
>> 548.exchange2_r (-0.11%)
>> 557.xz_r (-0.34%)
>> Geomean: (-0.23%)
>>
>> 503.bwaves_r (0.00%)
>> 507.cactuBSSN_r (-1.88%)
>> 508.namd_r (0.00%)
>> 510.parest_r (-0.56%)
>> 511.povray_r (0.49%)
>> 519.lbm_r (-1.28%)
>> 521.wrf_r (-0.28%)
>> 526.blender_r (0.55%)
>> 527.cam4_r (-0.20%)
>> 538.imagick_r (2.52%)
>> 544.nab_r (-0.18%)
>> 549.fotonik3d_r (-0.51%)
>> 554.roms_r (-0.22%)
>> Geomean: (0.00%)
>>
>> 2. On Skylake client:
>>
>> 500.perlbench_r (-0.29%)
>> 502.gcc_r (-0.36%)
>> 505.mcf_r (1.77%)
>> 520.omnetpp_r (-0.26%)
>> 523.xalancbmk_r (-3.69%)
>> 525.x264_r (-0.32%)
>> 531.deepsjeng_r (0.00%)
>> 541.leela_r (-0.46%)
>> 548.exchange2_r (0.00%)
>> 557.xz_r (0.00%)
>> Geomean: (-0.34%)
>>
>> 503.bwaves_r (0.00%)
>> 507.cactuBSSN_r (-0.56%)
>> 508.namd_r (0.87%)
>> 510.parest_r (0.00%)
>> 511.povray_r (-0.73%)
>> 519.lbm_r (0.84%)
>> 521.wrf_r (0.00%)
>> 526.blender_r (-0.81%)
>> 527.cam4_r (-0.43%)
>> 538.imagick_r (2.55%)
>> 544.nab_r (0.28%)
>> 549.fotonik3d_r (0.00%)
>> 554.roms_r (0.32%)
>> Geomean: (0.12%)
>>
>> 3. On Skylake server:
>>
>> 500.perlbench_r (-0.55%)
>> 502.gcc_r (0.69%)
>> 505.mcf_r (0.00%)
>> 520.omnetpp_r (-0.33%)
>> 523.xalancbmk_r (-0.21%)
>> 525.x264_r (-0.27%)
>> 531.deepsjeng_r (0.00%)
>> 541.leela_r (0.00%)
>> 548.exchange2_r (-0.11%)
>> 557.xz_r (0.00%)
>> Geomean: (0.00%)
>>
>> 503.bwaves_r (0.58%)
>> 507.cactuBSSN_r (0.00%)
>> 508.namd_r (0.00%)
>> 510.parest_r (0.18%)
>> 511.povray_r (-0.58%)
>> 519.lbm_r (0.25%)
>> 521.wrf_r (0.40%)
>> 526.blender_r (0.34%)
>> 527.cam4_r (0.19%)
>> 538.imagick_r (5.87%)
>> 544.nab_r (0.17%)
>> 549.fotonik3d_r (0.00%)
>> 554.roms_r (0.00%)
>> Geomean: (0.62%)
>>
>> On Skylake client, impacts on 538.imagick_r are
>>
>> size before:
>>
>>    text    data     bss     dec     hex filename
>> 2555577   10876    5576 2572029  273efd imagick_r.exe
>>
>> size after:
>>
>>    text    data     bss     dec     hex filename
>> 2511825   10876    5576 2528277  269415 imagick_r.exe
>>
>> number of vxorp[ds]:
>>
>> before          after           difference
>> 14570           4515            -69%
>>
>> OK for trunk?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>> gcc/
>>
>> 2018-08-28  H.J. Lu  <hongjiu...@intel.com>
>>             Sunil K Pandey  <sunil.k.pan...@intel.com>
>>
>>         PR target/87007
>>         * config/i386/i386-passes.def: Add
>>         pass_remove_partial_avx_dependency.
>>         * config/i386/i386-protos.h
>>         (make_pass_remove_partial_avx_dependency): New.
>>         * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>>         New function.
>>         (pass_data_remove_partial_avx_dependency): New.
>>         (pass_remove_partial_avx_dependency): Likewise.
>>         (make_pass_remove_partial_avx_dependency): Likewise.
>>         * config/i386/i386.md (SF/DF conversion splitters): Disabled
>>         for TARGET_AVX.
>>
>> gcc/testsuite/
>>
>> 2018-08-28  H.J. Lu  <hongjiu...@intel.com>
>>             Sunil K Pandey  <sunil.k.pan...@intel.com>
>>
>>         PR target/87007
>>         * gcc.target/i386/pr87007.c: New file.
>
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html
>

PING.

-- 
H.J.

Reply via email to