On Tue, May 29, 2018 at 11:32 AM Richard Biener <richard.guent...@gmail.com>
wrote:

> On Mon, May 28, 2018 at 5:50 PM Allan Sandfeld Jensen <li...@carewolf.com>
> wrote:

> > On Montag, 28. Mai 2018 12:58:20 CEST Richard Biener wrote:
> > > compile-time effects of the patch on that. Embedded folks may want to
> rhn
> > > their favorite benchmark and report results as well.
> > >
> > > So I did a -O2 -march=haswell [-ftree-slp-vectorize] SPEC CPU 2006
> compile
> > > and run and the compile-time
> > > effect where measurable (SPEC records on a second granularity) is
within
> > > one second per benchmark
> > > apart from 410.bwaves (from 3s to 5s)  and 481.wrf (76s to 78s).
> > > Performance-wise I notice significant
> > > slowdowns for SPEC FP and some for SPEC INT (I only did a train run
> > > sofar).  I'll re-run with ref input now
> > > and will post those numbers.
> > >
> > If you continue to see slowdowns, could you check with either no avx, or
> with
> > -mprefer-avx128? The occational AVX256 instructions might be
downclocking
> the
> > CPU. But yes that would be a problem for this change on its own.

> So here's a complete two-run with ref input, peak is -O2 -march=haswell
> -ftree-slp-vectorize.
> It confirms the slowdowns in SPEC FP but not in SPEC INT.  You are right
> that using
> AVX256 (or AVX512) might be problematic on its own but that is not
> restricted to
> -O2 -ftree-slp-vectorize but also -O3.  I will re-benchmark the SPEC FP
> part with
> -mprefer-avx128 to see if that is the issue.  Note I  did not use any
> -ffast-math flags in the
> experiment - those are as "unlikely" as using -march=native together with
> -O2.  In theory
> another issue is the ability to debug code.

>                   Base     Base       Base        Peak     Peak       Peak
> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> -------------- ------  ---------  ---------    ------  ---------
  ---------
> 410.bwaves      13590        362       37.5 *   13590        370      36.7
>    *
> 410.bwaves      13590        365       37.2 S   13590        377      36.0
>    S
> 416.gamess      19580        558       35.1 *   19580        598      32.7
>    *
> 416.gamess      19580        560       35.0 S   19580        600      32.6
>    S
> 433.milc         9180        331       27.8 S    9180        374      24.6
>    *
> 433.milc         9180        331       27.8 *    9180        383      24.0
>    S
> 434.zeusmp       9100        301       30.2 S    9100        301      30.2
>    *
> 434.zeusmp       9100        301       30.2 *    9100        302      30.1
>    S
> 435.gromacs      7140        300       23.8 S    7140        303      23.6
>    S
> 435.gromacs      7140        298       23.9 *    7140        301      23.8
>    *
> 436.cactusADM   11950        495       24.1 S   11950        482      24.8
>    *
> 436.cactusADM   11950        486       24.6 *   11950        484      24.7
>    S
> 437.leslie3d     9400        289       32.5 *    9400        288      32.6
>    *
> 437.leslie3d     9400        301       31.3 S    9400        289      32.5
>    S
> 444.namd         8020        301       26.6 *    8020        301      26.6
>    *
> 444.namd         8020        301       26.6 S    8020        301      26.6
>    S
> 447.dealII      11440        255       44.9 *   11440        252      45.3
>    *
> 447.dealII      11440        255       44.9 S   11440        253      45.3
>    S
> 450.soplex       8340        212       39.4 S    8340        213      39.1
>    S
> 450.soplex       8340        211       39.5 *    8340        211      39.5
>    *
> 453.povray       5320        111       47.9 S    5320        113      47.0
>    S
> 453.povray       5320        111       48.0 *    5320        113      47.2
>    *
> 454.calculix     8250        748       11.0 *    8250        835
9.88
> *
> 454.calculix     8250        748       11.0 S    8250        835
9.88
> S
> 459.GemsFDTD    10610        324       32.8 S   10610        324      32.8
>    S
> 459.GemsFDTD    10610        323       32.9 *   10610        323      32.9
>    *
> 465.tonto        9840        449       21.9 S    9840        469      21.0
>    *
> 465.tonto        9840        446       22.0 *    9840        469      21.0
>    S
> 470.lbm         13740        253       54.3 *   13740        255      53.9
>    S
> 470.lbm         13740        253       54.2 S   13740        254      54.2
>    *
> 481.wrf         11170        415       26.9 *   11170        416      26.9
>    S
> 481.wrf         11170        417       26.8 S   11170        416      26.9
>    *
> 482.sphinx3     19490        456       42.7 *   19490        465      41.9
>    *
> 482.sphinx3     19490        464       42.0 S   19490        468      41.6
>    S

Numbers with -mprefer-avx128:

                 Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves                                      13590        365       37.2
*
410.bwaves                                      13590        374       36.4
S
416.gamess                                      19580        596       32.9
*
416.gamess                                      19580        596       32.8
S
433.milc                                         9180        378       24.3
S
433.milc                                         9180        375       24.5
*
434.zeusmp                                       9100        302       30.1
S
434.zeusmp                                       9100        302       30.2
*
435.gromacs                                      7140        299       23.9
*
435.gromacs                                      7140        299       23.9
S
436.cactusADM                                   11950        483       24.7
S
436.cactusADM                                   11950        482       24.8
*
437.leslie3d                                     9400        290       32.5
*
437.leslie3d                                     9400        302       31.1
S
444.namd                                         8020        301       26.6
*
444.namd                                         8020        301       26.6
S
447.dealII                                      11440        253       45.2
*
447.dealII                                      11440        253       45.2
S
450.soplex                                       8340        212       39.3
S
450.soplex                                       8340        211       39.5
*
454.calculix                                     8250        750       11.0
*
454.calculix                                     8250        750       11.0
S
459.GemsFDTD                                    10610        323       32.9
*
459.GemsFDTD                                    10610        323       32.8
S
465.tonto                                        9840        466       21.1
*
465.tonto                                        9840        466       21.1
S
470.lbm                                         13740        254       54.2
*
470.lbm                                         13740        255       54.0
S
481.wrf                                         11170        417       26.8
*
481.wrf                                         11170        417       26.8
S
482.sphinx3                                     19490        465       41.9
*
482.sphinx3                                     19490        473       41.2
S

so the situation improves but isn't fully fixed (STLF issues maybe?)


>                   Base     Base       Base        Peak     Peak       Peak
> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> -------------- ------  ---------  ---------    ------  ---------
  ---------
> 400.perlbench    9770        251       38.9 S    9770        252
38.8
> S
> 400.perlbench    9770        250       39.1 *    9770        251
39.0
> *
> 401.bzip2        9650        399       24.2 S    9650        397
24.3
> S
> 401.bzip2        9650        395       24.4 *    9650        395
24.4
> *
> 403.gcc          8050        246       32.8 S    8050        245
32.9
> S
> 403.gcc          8050        244       33.0 *    8050        243
33.1
> *
> 429.mcf          9120        251       36.3 S    9120        248
36.8
> *
> 429.mcf          9120        250       36.5 *    9120        248
36.8
> S
> 445.gobmk       10490        394       26.6 S   10490        392
26.8
> *
> 445.gobmk       10490        393       26.7 *   10490        392
26.8
> S
> 456.hmmer        9330        389       24.0 S    9330        388
24.0
> *
> 456.hmmer        9330        389       24.0 *    9330        389
24.0
> S
> 458.sjeng       12100        447       27.1 *   12100        439
27.5
> *
> 458.sjeng       12100        449       27.0 S   12100        449
26.9
> S
> 462.libquantum  20720        309       67.0 S   20720        307
67.5
> S
> 462.libquantum  20720        302       68.7 *   20720        300
69.1
> *
> 464.h264ref     22130        457       48.5 S   22130        459
48.2
> S
> 464.h264ref     22130        456       48.6 *   22130        459
48.2
> *
> 471.omnetpp      6250        307       20.4 *    6250        308
20.3
> *
> 471.omnetpp      6250        317       19.7 S    6250        310
20.2
> S
> 473.astar        7020        346       20.3 *    7020        347
20.2
> *
> 473.astar        7020        346       20.3 S    7020        347
20.2
> S
> 483.xalancbmk    6900        198       34.8 *    6900        199
34.7
> *
> 483.xalancbmk    6900        202       34.2 S    6900        203
34.1
> S


> > 'Allan

Reply via email to