On Wed, May 30, 2018 at 11:25 AM Richard Biener <richard.guent...@gmail.com> wrote:
> On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen <li...@carewolf.com> > wrote: > > On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote: > > > > > > so the situation improves but isn't fully fixed (STLF issues maybe?) > > > > > That raises the question if it helps in these cases even in -O3? > That's a good question indeed. We might end up (partly) BB vectorizing > loop bodies that we'd otherwise loop vectorize with SLP. Benchmarking > with BB vectorization disabled at -O3+ isn't something I've done in the > past. I'm now doing a 2-run with -march=haswell -Ofast > [-fno-tree-slp-vectorize] > for the FP benchmarks. Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 178 76.4 * 13590 180 75.5 S 410.bwaves 13590 180 75.6 S 13590 179 76.0 * 416.gamess 19580 604 32.4 S 19580 576 34.0 S 416.gamess 19580 604 32.4 * 19580 575 34.0 * 433.milc 9180 339 27.1 * 9180 345 26.6 S 433.milc 9180 343 26.7 S 9180 343 26.8 * 434.zeusmp 9100 234 38.9 * 9100 234 38.9 S 434.zeusmp 9100 234 38.8 S 9100 234 38.9 * 435.gromacs 7140 251 28.5 * 7140 251 28.4 * 435.gromacs 7140 252 28.3 S 7140 252 28.3 S 436.cactusADM 11950 278 43.0 S 11950 222 53.8 S 436.cactusADM 11950 223 53.7 * 11950 221 54.1 * 437.leslie3d 9400 214 43.9 * 9400 215 43.6 * 437.leslie3d 9400 217 43.3 S 9400 222 42.4 S 444.namd 8020 302 26.5 S 8020 303 26.5 S 444.namd 8020 302 26.6 * 8020 303 26.5 * 447.dealII 11440 259 44.2 * 11440 246 46.6 * 447.dealII 11440 259 44.1 S 11440 246 46.6 S 450.soplex 8340 219 38.0 * 8340 219 38.0 * 450.soplex 8340 221 37.7 S 8340 221 37.7 S 453.povray 5320 108 49.2 * 5320 109 48.7 S 453.povray 5320 108 49.1 S 5320 109 48.8 * 454.calculix 8250 270 30.6 * 8250 269 30.6 * 454.calculix 8250 271 30.5 S 8250 270 30.5 S 459.GemsFDTD 10610 308 34.5 S 10610 306 34.7 S 459.GemsFDTD 10610 306 34.7 * 10610 306 34.7 * 465.tonto 9840 428 23.0 S 9840 423 23.3 * 465.tonto 9840 426 23.1 * 9840 423 23.2 S 470.lbm 13740 253 54.4 S 13740 252 54.5 * 470.lbm 13740 252 54.5 * 13740 252 54.5 S 481.wrf 11170 265 42.1 * 11170 265 42.2 S 481.wrf 11170 266 42.1 S 11170 264 42.3 * 482.sphinx3 19490 401 48.6 * 19490 402 48.5 S 482.sphinx3 19490 405 48.1 S 19490 399 48.9 * so we can indeed see similar detrimental effects on 416.gamess; 447.dealII seems to improve with BB vectorization. That means the 416.gamess slowdown is definitely worth investigating since it reproduces with both AVX128 and AVX256 and with loop vectorization. I'll open a bug for it. > Note that there were some cases where disabling vectorization wholly > improved things. > > Anyway it doesn't look good for it. Did the binary size at least improve > with > > prefer-avx128, or was that also worse or insignificant? > Similar to the AVX258 results. I guess where AVX256 applied we now simply > do two vector ops with AVX128. > Richard. > > 'Allan