On Fri, 15 Mar 2019, Jan Hubicka wrote:

> > 
> > A previous patch of mine correcting the vectorizer target cost model
> > to properly cost scalar FP ops vs. scalar INT ops regressed
> > 416.gamess by ~10% on all modern x86 archs.
> > 
> > The following mitigates this in the cost modeling by noticing
> > the vectorized loop in question has all loads and stores performed
> > strided (built up from scalar loads/stores) and building upon
> > the pessimization of strided loads added last year.
> > 
> > The first half is treating strided stores the same as strided
> > loads which may make sense (but the latency and dependence
> > arguments do not count here).  Unfortunately that alone
> > doesn't make 416.gamess vectorization fail because we end up
> > with TYPE_VECTOR_SUBPARTS == 2 (AVX256 vectorization is rejected
> > due to cost reasons already).  Now comes the second half
> > which is to push it over the edge, adjusting the previous
> > pessimization by multiplying with TYPE_VECTOR_SUBPARTS + 1
> > instead of just TYPE_VECTOR_SUBPARTS which makes the biggest
> > difference for smaller vectors.
> > 
> > I've benchmarked this on a Haswell machine with SPEC 2006
> > confirming the regression is fixed and re-benchmarked
> > appearant regressions with 3 runs confirming that was noise
> > and we end up with maybe even a progression there
> > (see the bugzilla audit-trail for details).
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK for trunk?
> > 
> > Note I'm going to apply as two revisions to allow bisection
> > between the two changes, first pushing pessimizing strided
> > stores and then adjusting the factor.
> > 
> > Thanks,
> > Richard.
> > 
> > 2019-03-15  Richard Biener  <rguent...@suse.de>
> > 
> >     PR target/87561
> >     * config/i386/i386.c (ix86_add_stmt_cost): Apply strided
> >     load pessimization to stores as well.
> >     * config/i386/i386.c (ix86_add_stmt_cost): Pessimize strided
> >     loads and stores a bit more.
> 
> Looks good to me.  Store costs are even more iffy than other costs
> because they are not part of dependency chain,so I guess whatever seems
> to work best in practice is good.

Applied as r269753 and r269754.  Please report any issue with this.

Richard.

Reply via email to