On Mon, 12 Mar 2018, Aaron Sawdey wrote: > Looking at CPU2017 results for different reassociation widths, things > have shifted since I last looked at this with CPU2006 in early gcc7 > timeframe. Best thing to do seems to be to set reassociation width to 1 > for all integer modes, which is what the attached patch does. > > I also tried setting width to 1 for float modes PLUS_EXPR as this patch > did for aarch64 but this does not seem to be helpful for power8. > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01271.html > > > Results below are % performance improvement on power8 comparing trunk > with the attached patch vs trunk with --param tree-reassoc-width=1 to > disable parallel reassociation for everything (first column of results) > and trunk unmodified (second column of results). > > CPU2017 component vs width=1 vs trunk > 500.perlbench_r -0.36% -0.15% > 502.gcc_r 0.06% 0.04% > 505.mcf_r 0.32% 0.24% > 520.omnetpp_r 0.57% -0.95% > 523.xalancbmk_r 1.45% 1.04% > 525.x264_r -0.05% 0.09% > 531.deepsjeng_r 0.04% 0.09% > 541.leela_r 0.10% 0.72% > 548.exchange2_r 0.08% 0.73% > 557.xz_r 0.09% 2.12% > CPU2017 int geo mean 0.23% 0.40% > 503.bwaves_r 0.00% 0.01% > 507.cactuBSSN_r 0.05% -0.02% > 508.namd_r 0.00% 0.00% > 510.parest_r -0.01% 0.20% > 511.povray_r 0.03% -0.24% > 519.lbm_r -0.04% -0.16% > 521.wrf_r -0.01% -0.56% > 526.blender_r -0.82% -0.47% > 527.cam4_r -0.18% 0.06% > 538.imagick_r -0.02% 0.01% > 544.nab_r 0.00% 0.23% > 549.fotonik3d_r 0.24% 0.54% > 554.roms_r -0.05% 0.03% > CPU2017 fp geo mean -0.06% -0.03% > > Bottom line is net improvement for CPU2017 int compared with either > current trunk, or disabling parallel reassociation. For CPU2017 fp, > very small overall degradation. > > Currently doing regstrap on ppc64le, ok for trunk if results look good?
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 258101) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -10006,7 +10006,7 @@ if (VECTOR_MODE_P (mode)) return 4; if (INTEGRAL_MODE_P (mode)) - return opc == MULT_EXPR ? 4 : 6; + return 1; if (FLOAT_MODE_P (mode)) return 4; break; so the original widths were very large (IMHO), did you try reducing width to, say, 2? In your numbers I see mostly noise but 2% regression for 557.xz_r and 1% for 523.xalancbmk_r. Maybe POWER machines give very stable performance measurement results but from my experience on x86_64 anything < 1% is just noise... Richard. > Thanks! > Aaron > > 2018-03-12 Aaron Sawdey <acsaw...@linux.vnet.ibm.com> > > PR target/84743 > * config/rs6000/rs6000.c (rs6000_reassociation_width): Disable parallel > reassociation for int modes. > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)