On Wed, May 30, 2018 at 11:25 AM Richard Biener <richard.guent...@gmail.com>
wrote:

> On Tue, May 29, 2018 at 5:24 PM Allan Sandfeld Jensen <li...@carewolf.com>
> wrote:

> > On Dienstag, 29. Mai 2018 16:57:56 CEST Richard Biener wrote:
> > >
> > > so the situation improves but isn't fully fixed (STLF issues maybe?)
> > >

> > That raises the question if it helps in these cases even in -O3?

> That's a good question indeed.  We might end up (partly) BB vectorizing
> loop bodies that we'd otherwise loop vectorize with SLP.  Benchmarking
> with BB vectorization disabled at -O3+ isn't something I've done in the
> past.  I'm now doing a 2-run with -march=haswell -Ofast
> [-fno-tree-slp-vectorize]
> for the FP benchmarks.

                 Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        178       76.4 *   13590        180       75.5
S
410.bwaves      13590        180       75.6 S   13590        179       76.0
*
416.gamess      19580        604       32.4 S   19580        576       34.0
S
416.gamess      19580        604       32.4 *   19580        575       34.0
*
433.milc         9180        339       27.1 *    9180        345       26.6
S
433.milc         9180        343       26.7 S    9180        343       26.8
*
434.zeusmp       9100        234       38.9 *    9100        234       38.9
S
434.zeusmp       9100        234       38.8 S    9100        234       38.9
*
435.gromacs      7140        251       28.5 *    7140        251       28.4
*
435.gromacs      7140        252       28.3 S    7140        252       28.3
S
436.cactusADM   11950        278       43.0 S   11950        222       53.8
S
436.cactusADM   11950        223       53.7 *   11950        221       54.1
*
437.leslie3d     9400        214       43.9 *    9400        215       43.6
*
437.leslie3d     9400        217       43.3 S    9400        222       42.4
S
444.namd         8020        302       26.5 S    8020        303       26.5
S
444.namd         8020        302       26.6 *    8020        303       26.5
*
447.dealII      11440        259       44.2 *   11440        246       46.6
*
447.dealII      11440        259       44.1 S   11440        246       46.6
S
450.soplex       8340        219       38.0 *    8340        219       38.0
*
450.soplex       8340        221       37.7 S    8340        221       37.7
S
453.povray       5320        108       49.2 *    5320        109       48.7
S
453.povray       5320        108       49.1 S    5320        109       48.8
*
454.calculix     8250        270       30.6 *    8250        269       30.6
*
454.calculix     8250        271       30.5 S    8250        270       30.5
S
459.GemsFDTD    10610        308       34.5 S   10610        306       34.7
S
459.GemsFDTD    10610        306       34.7 *   10610        306       34.7
*
465.tonto        9840        428       23.0 S    9840        423       23.3
*
465.tonto        9840        426       23.1 *    9840        423       23.2
S
470.lbm         13740        253       54.4 S   13740        252       54.5
*
470.lbm         13740        252       54.5 *   13740        252       54.5
S
481.wrf         11170        265       42.1 *   11170        265       42.2
S
481.wrf         11170        266       42.1 S   11170        264       42.3
*
482.sphinx3     19490        401       48.6 *   19490        402       48.5
S
482.sphinx3     19490        405       48.1 S   19490        399       48.9
*

so we can indeed see similar detrimental effects on 416.gamess;  447.dealII
seems to improve with BB vectorization.

That means the 416.gamess slowdown is definitely worth investigating
since it reproduces with both AVX128 and AVX256 and with loop
vectorization.  I'll open a bug for it.

> Note that there were some cases where disabling vectorization wholly
> improved things.

> > Anyway it doesn't look good for it. Did the binary size at least improve
> with
> > prefer-avx128, or was that also worse or insignificant?

> Similar to the AVX258 results.  I guess where AVX256 applied we now simply
> do two vector ops with AVX128.

> Richard.


> > 'Allan

Reply via email to