On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote: > With -mavx, for > > [hjl@gnu-cfl-1 skx-2]$ cat foo.i > extern float f; > extern double d; > extern int i; > > void > foo (void) > { > d = f; > f = i; > } > > we need to generate > > vxorp[ds] %xmmN, %xmmN, %xmmN > ... > vcvtss2sd f(%rip), %xmmN, %xmmX > ... > vcvtsi2ss i(%rip), %xmmN, %xmmY > > to avoid partial XMM register stall. This patch adds a pass to generate > a single > > vxorps %xmmN, %xmmN, %xmmN > > at function entry, which is shared by all SF and DF conversions, instead > of generating one > > vxorp[ds] %xmmN, %xmmN, %xmmN > > for each SF/DF conversion. > > Performance impacts on SPEC CPU 2017 rate with 1 copy using > > -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops > > are > > 1. On Broadwell server: > > 500.perlbench_r (-0.82%) > 502.gcc_r (0.73%) > 505.mcf_r (-0.24%) > 520.omnetpp_r (-2.22%) > 523.xalancbmk_r (-1.47%) > 525.x264_r (0.31%) > 531.deepsjeng_r (0.27%) > 541.leela_r (0.85%) > 548.exchange2_r (-0.11%) > 557.xz_r (-0.34%) > Geomean: (-0.23%) > > 503.bwaves_r (0.00%) > 507.cactuBSSN_r (-1.88%) > 508.namd_r (0.00%) > 510.parest_r (-0.56%) > 511.povray_r (0.49%) > 519.lbm_r (-1.28%) > 521.wrf_r (-0.28%) > 526.blender_r (0.55%) > 527.cam4_r (-0.20%) > 538.imagick_r (2.52%) > 544.nab_r (-0.18%) > 549.fotonik3d_r (-0.51%) > 554.roms_r (-0.22%) > Geomean: (0.00%) > > 2. On Skylake client: > > 500.perlbench_r (-0.29%) > 502.gcc_r (-0.36%) > 505.mcf_r (1.77%) > 520.omnetpp_r (-0.26%) > 523.xalancbmk_r (-3.69%) > 525.x264_r (-0.32%) > 531.deepsjeng_r (0.00%) > 541.leela_r (-0.46%) > 548.exchange2_r (0.00%) > 557.xz_r (0.00%) > Geomean: (-0.34%) > > 503.bwaves_r (0.00%) > 507.cactuBSSN_r (-0.56%) > 508.namd_r (0.87%) > 510.parest_r (0.00%) > 511.povray_r (-0.73%) > 519.lbm_r (0.84%) > 521.wrf_r (0.00%) > 526.blender_r (-0.81%) > 527.cam4_r (-0.43%) > 538.imagick_r (2.55%) > 544.nab_r (0.28%) > 549.fotonik3d_r (0.00%) > 554.roms_r (0.32%) > Geomean: (0.12%) > > 3. On Skylake server: > > 500.perlbench_r (-0.55%) > 502.gcc_r (0.69%) > 505.mcf_r (0.00%) > 520.omnetpp_r (-0.33%) > 523.xalancbmk_r (-0.21%) > 525.x264_r (-0.27%) > 531.deepsjeng_r (0.00%) > 541.leela_r (0.00%) > 548.exchange2_r (-0.11%) > 557.xz_r (0.00%) > Geomean: (0.00%) > > 503.bwaves_r (0.58%) > 507.cactuBSSN_r (0.00%) > 508.namd_r (0.00%) > 510.parest_r (0.18%) > 511.povray_r (-0.58%) > 519.lbm_r (0.25%) > 521.wrf_r (0.40%) > 526.blender_r (0.34%) > 527.cam4_r (0.19%) > 538.imagick_r (5.87%) > 544.nab_r (0.17%) > 549.fotonik3d_r (0.00%) > 554.roms_r (0.00%) > Geomean: (0.62%) > > On Skylake client, impacts on 538.imagick_r are > > size before: > > text data bss dec hex filename > 2555577 10876 5576 2572029 273efd imagick_r.exe > > size after: > > text data bss dec hex filename > 2511825 10876 5576 2528277 269415 imagick_r.exe > > number of vxorp[ds]: > > before after difference > 14570 4515 -69% > > OK for trunk? > > Thanks. > > > H.J. > --- > gcc/ > > 2018-08-28 H.J. Lu <hongjiu...@intel.com> > Sunil K Pandey <sunil.k.pan...@intel.com> > > PR target/87007 > * config/i386/i386-passes.def: Add > pass_remove_partial_avx_dependency. > * config/i386/i386-protos.h > (make_pass_remove_partial_avx_dependency): New. > * config/i386/i386.c (make_pass_remove_partial_avx_dependency): > New function. > (pass_data_remove_partial_avx_dependency): New. > (pass_remove_partial_avx_dependency): Likewise. > (make_pass_remove_partial_avx_dependency): Likewise. > * config/i386/i386.md (SF/DF conversion splitters): Disabled > for TARGET_AVX. > > gcc/testsuite/ > > 2018-08-28 H.J. Lu <hongjiu...@intel.com> > Sunil K Pandey <sunil.k.pan...@intel.com> > > PR target/87007 > * gcc.target/i386/pr87007.c: New file.
PING: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html -- H.J.