On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote: >> With -mavx, for >> >> [hjl@gnu-cfl-1 skx-2]$ cat foo.i >> extern float f; >> extern double d; >> extern int i; >> >> void >> foo (void) >> { >> d = f; >> f = i; >> } >> >> we need to generate >> >> vxorp[ds] %xmmN, %xmmN, %xmmN >> ... >> vcvtss2sd f(%rip), %xmmN, %xmmX >> ... >> vcvtsi2ss i(%rip), %xmmN, %xmmY >> >> to avoid partial XMM register stall. This patch adds a pass to generate >> a single >> >> vxorps %xmmN, %xmmN, %xmmN >> >> at function entry, which is shared by all SF and DF conversions, instead >> of generating one >> >> vxorp[ds] %xmmN, %xmmN, %xmmN >> >> for each SF/DF conversion. >> >> Performance impacts on SPEC CPU 2017 rate with 1 copy using >> >> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops >> >> are >> >> 1. On Broadwell server: >> >> 500.perlbench_r (-0.82%) >> 502.gcc_r (0.73%) >> 505.mcf_r (-0.24%) >> 520.omnetpp_r (-2.22%) >> 523.xalancbmk_r (-1.47%) >> 525.x264_r (0.31%) >> 531.deepsjeng_r (0.27%) >> 541.leela_r (0.85%) >> 548.exchange2_r (-0.11%) >> 557.xz_r (-0.34%) >> Geomean: (-0.23%) >> >> 503.bwaves_r (0.00%) >> 507.cactuBSSN_r (-1.88%) >> 508.namd_r (0.00%) >> 510.parest_r (-0.56%) >> 511.povray_r (0.49%) >> 519.lbm_r (-1.28%) >> 521.wrf_r (-0.28%) >> 526.blender_r (0.55%) >> 527.cam4_r (-0.20%) >> 538.imagick_r (2.52%) >> 544.nab_r (-0.18%) >> 549.fotonik3d_r (-0.51%) >> 554.roms_r (-0.22%) >> Geomean: (0.00%) >> >> 2. On Skylake client: >> >> 500.perlbench_r (-0.29%) >> 502.gcc_r (-0.36%) >> 505.mcf_r (1.77%) >> 520.omnetpp_r (-0.26%) >> 523.xalancbmk_r (-3.69%) >> 525.x264_r (-0.32%) >> 531.deepsjeng_r (0.00%) >> 541.leela_r (-0.46%) >> 548.exchange2_r (0.00%) >> 557.xz_r (0.00%) >> Geomean: (-0.34%) >> >> 503.bwaves_r (0.00%) >> 507.cactuBSSN_r (-0.56%) >> 508.namd_r (0.87%) >> 510.parest_r (0.00%) >> 511.povray_r (-0.73%) >> 519.lbm_r (0.84%) >> 521.wrf_r (0.00%) >> 526.blender_r (-0.81%) >> 527.cam4_r (-0.43%) >> 538.imagick_r (2.55%) >> 544.nab_r (0.28%) >> 549.fotonik3d_r (0.00%) >> 554.roms_r (0.32%) >> Geomean: (0.12%) >> >> 3. On Skylake server: >> >> 500.perlbench_r (-0.55%) >> 502.gcc_r (0.69%) >> 505.mcf_r (0.00%) >> 520.omnetpp_r (-0.33%) >> 523.xalancbmk_r (-0.21%) >> 525.x264_r (-0.27%) >> 531.deepsjeng_r (0.00%) >> 541.leela_r (0.00%) >> 548.exchange2_r (-0.11%) >> 557.xz_r (0.00%) >> Geomean: (0.00%) >> >> 503.bwaves_r (0.58%) >> 507.cactuBSSN_r (0.00%) >> 508.namd_r (0.00%) >> 510.parest_r (0.18%) >> 511.povray_r (-0.58%) >> 519.lbm_r (0.25%) >> 521.wrf_r (0.40%) >> 526.blender_r (0.34%) >> 527.cam4_r (0.19%) >> 538.imagick_r (5.87%) >> 544.nab_r (0.17%) >> 549.fotonik3d_r (0.00%) >> 554.roms_r (0.00%) >> Geomean: (0.62%) >> >> On Skylake client, impacts on 538.imagick_r are >> >> size before: >> >> text data bss dec hex filename >> 2555577 10876 5576 2572029 273efd imagick_r.exe >> >> size after: >> >> text data bss dec hex filename >> 2511825 10876 5576 2528277 269415 imagick_r.exe >> >> number of vxorp[ds]: >> >> before after difference >> 14570 4515 -69% >> >> OK for trunk? >> >> Thanks. >> >> >> H.J. >> --- >> gcc/ >> >> 2018-08-28 H.J. Lu <hongjiu...@intel.com> >> Sunil K Pandey <sunil.k.pan...@intel.com> >> >> PR target/87007 >> * config/i386/i386-passes.def: Add >> pass_remove_partial_avx_dependency. >> * config/i386/i386-protos.h >> (make_pass_remove_partial_avx_dependency): New. >> * config/i386/i386.c (make_pass_remove_partial_avx_dependency): >> New function. >> (pass_data_remove_partial_avx_dependency): New. >> (pass_remove_partial_avx_dependency): Likewise. >> (make_pass_remove_partial_avx_dependency): Likewise. >> * config/i386/i386.md (SF/DF conversion splitters): Disabled >> for TARGET_AVX. >> >> gcc/testsuite/ >> >> 2018-08-28 H.J. Lu <hongjiu...@intel.com> >> Sunil K Pandey <sunil.k.pan...@intel.com> >> >> PR target/87007 >> * gcc.target/i386/pr87007.c: New file. > > > PING: > > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html >
PING. -- H.J.