> > A previous patch of mine correcting the vectorizer target cost model > to properly cost scalar FP ops vs. scalar INT ops regressed > 416.gamess by ~10% on all modern x86 archs. > > The following mitigates this in the cost modeling by noticing > the vectorized loop in question has all loads and stores performed > strided (built up from scalar loads/stores) and building upon > the pessimization of strided loads added last year. > > The first half is treating strided stores the same as strided > loads which may make sense (but the latency and dependence > arguments do not count here). Unfortunately that alone > doesn't make 416.gamess vectorization fail because we end up > with TYPE_VECTOR_SUBPARTS == 2 (AVX256 vectorization is rejected > due to cost reasons already). Now comes the second half > which is to push it over the edge, adjusting the previous > pessimization by multiplying with TYPE_VECTOR_SUBPARTS + 1 > instead of just TYPE_VECTOR_SUBPARTS which makes the biggest > difference for smaller vectors. > > I've benchmarked this on a Haswell machine with SPEC 2006 > confirming the regression is fixed and re-benchmarked > appearant regressions with 3 runs confirming that was noise > and we end up with maybe even a progression there > (see the bugzilla audit-trail for details). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK for trunk? > > Note I'm going to apply as two revisions to allow bisection > between the two changes, first pushing pessimizing strided > stores and then adjusting the factor. > > Thanks, > Richard. > > 2019-03-15 Richard Biener <rguent...@suse.de> > > PR target/87561 > * config/i386/i386.c (ix86_add_stmt_cost): Apply strided > load pessimization to stores as well. > * config/i386/i386.c (ix86_add_stmt_cost): Pessimize strided > loads and stores a bit more.
Looks good to me. Store costs are even more iffy than other costs because they are not part of dependency chain,so I guess whatever seems to work best in practice is good. Honza