On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote:
> With -mavx, for
>
> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
> extern float f;
> extern double d;
> extern int i;
>
> void
> foo (void)
> {
>   d = f;
>   f = i;
> }
>
> we need to generate
>
>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>         ...
>         vcvtss2sd       f(%rip), %xmmN, %xmmX
>         ...
>         vcvtsi2ss       i(%rip), %xmmN, %xmmY
>
> to avoid partial XMM register stall.  This patch adds a pass to generate
> a single
>
>         vxorps          %xmmN, %xmmN, %xmmN
>
> at function entry, which is shared by all SF and DF conversions, instead
> of generating one
>
>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>
> for each SF/DF conversion.
>
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>
> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>
> are
>
> 1. On Broadwell server:
>
> 500.perlbench_r (-0.82%)
> 502.gcc_r (0.73%)
> 505.mcf_r (-0.24%)
> 520.omnetpp_r (-2.22%)
> 523.xalancbmk_r (-1.47%)
> 525.x264_r (0.31%)
> 531.deepsjeng_r (0.27%)
> 541.leela_r (0.85%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (-0.34%)
> Geomean: (-0.23%)
>
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-1.88%)
> 508.namd_r (0.00%)
> 510.parest_r (-0.56%)
> 511.povray_r (0.49%)
> 519.lbm_r (-1.28%)
> 521.wrf_r (-0.28%)
> 526.blender_r (0.55%)
> 527.cam4_r (-0.20%)
> 538.imagick_r (2.52%)
> 544.nab_r (-0.18%)
> 549.fotonik3d_r (-0.51%)
> 554.roms_r (-0.22%)
> Geomean: (0.00%)
>
> 2. On Skylake client:
>
> 500.perlbench_r (-0.29%)
> 502.gcc_r (-0.36%)
> 505.mcf_r (1.77%)
> 520.omnetpp_r (-0.26%)
> 523.xalancbmk_r (-3.69%)
> 525.x264_r (-0.32%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (-0.46%)
> 548.exchange2_r (0.00%)
> 557.xz_r (0.00%)
> Geomean: (-0.34%)
>
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-0.56%)
> 508.namd_r (0.87%)
> 510.parest_r (0.00%)
> 511.povray_r (-0.73%)
> 519.lbm_r (0.84%)
> 521.wrf_r (0.00%)
> 526.blender_r (-0.81%)
> 527.cam4_r (-0.43%)
> 538.imagick_r (2.55%)
> 544.nab_r (0.28%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.32%)
> Geomean: (0.12%)
>
> 3. On Skylake server:
>
> 500.perlbench_r (-0.55%)
> 502.gcc_r (0.69%)
> 505.mcf_r (0.00%)
> 520.omnetpp_r (-0.33%)
> 523.xalancbmk_r (-0.21%)
> 525.x264_r (-0.27%)
> 531.deepsjeng_r (0.00%)
> 541.leela_r (0.00%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (0.00%)
> Geomean: (0.00%)
>
> 503.bwaves_r (0.58%)
> 507.cactuBSSN_r (0.00%)
> 508.namd_r (0.00%)
> 510.parest_r (0.18%)
> 511.povray_r (-0.58%)
> 519.lbm_r (0.25%)
> 521.wrf_r (0.40%)
> 526.blender_r (0.34%)
> 527.cam4_r (0.19%)
> 538.imagick_r (5.87%)
> 544.nab_r (0.17%)
> 549.fotonik3d_r (0.00%)
> 554.roms_r (0.00%)
> Geomean: (0.62%)
>
> On Skylake client, impacts on 538.imagick_r are
>
> size before:
>
>    text    data     bss     dec     hex filename
> 2555577   10876    5576 2572029  273efd imagick_r.exe
>
> size after:
>
>    text    data     bss     dec     hex filename
> 2511825   10876    5576 2528277  269415 imagick_r.exe
>
> number of vxorp[ds]:
>
> before          after           difference
> 14570           4515            -69%
>
> OK for trunk?
>
> Thanks.
>
>
> H.J.
> ---
> gcc/
>
> 2018-08-28  H.J. Lu  <hongjiu...@intel.com>
>             Sunil K Pandey  <sunil.k.pan...@intel.com>
>
>         PR target/87007
>         * config/i386/i386-passes.def: Add
>         pass_remove_partial_avx_dependency.
>         * config/i386/i386-protos.h
>         (make_pass_remove_partial_avx_dependency): New.
>         * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>         New function.
>         (pass_data_remove_partial_avx_dependency): New.
>         (pass_remove_partial_avx_dependency): Likewise.
>         (make_pass_remove_partial_avx_dependency): Likewise.
>         * config/i386/i386.md (SF/DF conversion splitters): Disabled
>         for TARGET_AVX.
>
> gcc/testsuite/
>
> 2018-08-28  H.J. Lu  <hongjiu...@intel.com>
>             Sunil K Pandey  <sunil.k.pan...@intel.com>
>
>         PR target/87007
>         * gcc.target/i386/pr87007.c: New file.


PING:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html



-- 
H.J.

Reply via email to