On Tue, May 29, 2018 at 11:32 AM Richard Biener <richard.guent...@gmail.com> wrote:
> On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen <li...@carewolf.com> > wrote: > > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote: > > > compile-time effects of the patch on that. Embedded folks may want to > rhn > > > their favorite benchmark and report results as well. > > > > > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006 > compile > > > and run and the compile-time > > > effect where measurable (SPEC records on a second granularity) is within > > > one second per benchmark > > > apart from 410.bwaves (from 3s to 5s) and 481.wrf (76s to 78s). > > > Performance-wise I notice significant > > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run > > > sofar). I'll re-run with ref input now > > > and will post those numbers. > > > > > If you continue to see slowdowns, could you check with either no avx, or > with > > -mprefer-avx128? The occational AVX256 instructions might be downclocking > the > > CPU. But yes that would be a problem for this change on its own. > So here's a complete two-run with ref input, peak is -O2 -march=haswell > -ftree-slp-vectorize. > It confirms the slowdowns in SPEC FP but not in SPEC INT. You are right > that using > AVX256 (or AVX512) might be problematic on its own but that is not > restricted to > -O2 -ftree-slp-vectorize but also -O3. I will re-benchmark the SPEC FP > part with > -mprefer-avx128 to see if that is the issue. Note I did not use any > -ffast-math flags in the > experiment - those are as "unlikely" as using -march=native together with > -O2. In theory > another issue is the ability to debug code. > Base Base Base Peak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -------------- ------ --------- --------- ------ --------- --------- > 410.bwaves 13590 362 37.5 * 13590 370 36.7 > * > 410.bwaves 13590 365 37.2 S 13590 377 36.0 > S > 416.gamess 19580 558 35.1 * 19580 598 32.7 > * > 416.gamess 19580 560 35.0 S 19580 600 32.6 > S > 433.milc 9180 331 27.8 S 9180 374 24.6 > * > 433.milc 9180 331 27.8 * 9180 383 24.0 > S > 434.zeusmp 9100 301 30.2 S 9100 301 30.2 > * > 434.zeusmp 9100 301 30.2 * 9100 302 30.1 > S > 435.gromacs 7140 300 23.8 S 7140 303 23.6 > S > 435.gromacs 7140 298 23.9 * 7140 301 23.8 > * > 436.cactusADM 11950 495 24.1 S 11950 482 24.8 > * > 436.cactusADM 11950 486 24.6 * 11950 484 24.7 > S > 437.leslie3d 9400 289 32.5 * 9400 288 32.6 > * > 437.leslie3d 9400 301 31.3 S 9400 289 32.5 > S > 444.namd 8020 301 26.6 * 8020 301 26.6 > * > 444.namd 8020 301 26.6 S 8020 301 26.6 > S > 447.dealII 11440 255 44.9 * 11440 252 45.3 > * > 447.dealII 11440 255 44.9 S 11440 253 45.3 > S > 450.soplex 8340 212 39.4 S 8340 213 39.1 > S > 450.soplex 8340 211 39.5 * 8340 211 39.5 > * > 453.povray 5320 111 47.9 S 5320 113 47.0 > S > 453.povray 5320 111 48.0 * 5320 113 47.2 > * > 454.calculix 8250 748 11.0 * 8250 835 9.88 > * > 454.calculix 8250 748 11.0 S 8250 835 9.88 > S > 459.GemsFDTD 10610 324 32.8 S 10610 324 32.8 > S > 459.GemsFDTD 10610 323 32.9 * 10610 323 32.9 > * > 465.tonto 9840 449 21.9 S 9840 469 21.0 > * > 465.tonto 9840 446 22.0 * 9840 469 21.0 > S > 470.lbm 13740 253 54.3 * 13740 255 53.9 > S > 470.lbm 13740 253 54.2 S 13740 254 54.2 > * > 481.wrf 11170 415 26.9 * 11170 416 26.9 > S > 481.wrf 11170 417 26.8 S 11170 416 26.9 > * > 482.sphinx3 19490 456 42.7 * 19490 465 41.9 > * > 482.sphinx3 19490 464 42.0 S 19490 468 41.6 > S Numbers with -mprefer-avx128: Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 365 37.2 * 410.bwaves 13590 374 36.4 S 416.gamess 19580 596 32.9 * 416.gamess 19580 596 32.8 S 433.milc 9180 378 24.3 S 433.milc 9180 375 24.5 * 434.zeusmp 9100 302 30.1 S 434.zeusmp 9100 302 30.2 * 435.gromacs 7140 299 23.9 * 435.gromacs 7140 299 23.9 S 436.cactusADM 11950 483 24.7 S 436.cactusADM 11950 482 24.8 * 437.leslie3d 9400 290 32.5 * 437.leslie3d 9400 302 31.1 S 444.namd 8020 301 26.6 * 444.namd 8020 301 26.6 S 447.dealII 11440 253 45.2 * 447.dealII 11440 253 45.2 S 450.soplex 8340 212 39.3 S 450.soplex 8340 211 39.5 * 454.calculix 8250 750 11.0 * 454.calculix 8250 750 11.0 S 459.GemsFDTD 10610 323 32.9 * 459.GemsFDTD 10610 323 32.8 S 465.tonto 9840 466 21.1 * 465.tonto 9840 466 21.1 S 470.lbm 13740 254 54.2 * 470.lbm 13740 255 54.0 S 481.wrf 11170 417 26.8 * 481.wrf 11170 417 26.8 S 482.sphinx3 19490 465 41.9 * 482.sphinx3 19490 473 41.2 S so the situation improves but isn't fully fixed (STLF issues maybe?) > Base Base Base Peak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -------------- ------ --------- --------- ------ --------- --------- > 400.perlbench 9770 251 38.9 S 9770 252 38.8 > S > 400.perlbench 9770 250 39.1 * 9770 251 39.0 > * > 401.bzip2 9650 399 24.2 S 9650 397 24.3 > S > 401.bzip2 9650 395 24.4 * 9650 395 24.4 > * > 403.gcc 8050 246 32.8 S 8050 245 32.9 > S > 403.gcc 8050 244 33.0 * 8050 243 33.1 > * > 429.mcf 9120 251 36.3 S 9120 248 36.8 > * > 429.mcf 9120 250 36.5 * 9120 248 36.8 > S > 445.gobmk 10490 394 26.6 S 10490 392 26.8 > * > 445.gobmk 10490 393 26.7 * 10490 392 26.8 > S > 456.hmmer 9330 389 24.0 S 9330 388 24.0 > * > 456.hmmer 9330 389 24.0 * 9330 389 24.0 > S > 458.sjeng 12100 447 27.1 * 12100 439 27.5 > * > 458.sjeng 12100 449 27.0 S 12100 449 26.9 > S > 462.libquantum 20720 309 67.0 S 20720 307 67.5 > S > 462.libquantum 20720 302 68.7 * 20720 300 69.1 > * > 464.h264ref 22130 457 48.5 S 22130 459 48.2 > S > 464.h264ref 22130 456 48.6 * 22130 459 48.2 > * > 471.omnetpp 6250 307 20.4 * 6250 308 20.3 > * > 471.omnetpp 6250 317 19.7 S 6250 310 20.2 > S > 473.astar 7020 346 20.3 * 7020 347 20.2 > * > 473.astar 7020 346 20.3 S 7020 347 20.2 > S > 483.xalancbmk 6900 198 34.8 * 6900 199 34.7 > * > 483.xalancbmk 6900 202 34.2 S 6900 203 34.1 > S > > 'Allan