Re: [PATCH] Fix PR87561, 416.gamess slowdown

Jan Hubicka Fri, 15 Mar 2019 05:54:57 -0700

> 
> A previous patch of mine correcting the vectorizer target cost model
> to properly cost scalar FP ops vs. scalar INT ops regressed
> 416.gamess by ~10% on all modern x86 archs.
> 
> The following mitigates this in the cost modeling by noticing
> the vectorized loop in question has all loads and stores performed
> strided (built up from scalar loads/stores) and building upon
> the pessimization of strided loads added last year.
> 
> The first half is treating strided stores the same as strided
> loads which may make sense (but the latency and dependence
> arguments do not count here).  Unfortunately that alone
> doesn't make 416.gamess vectorization fail because we end up
> with TYPE_VECTOR_SUBPARTS == 2 (AVX256 vectorization is rejected
> due to cost reasons already).  Now comes the second half
> which is to push it over the edge, adjusting the previous
> pessimization by multiplying with TYPE_VECTOR_SUBPARTS + 1
> instead of just TYPE_VECTOR_SUBPARTS which makes the biggest
> difference for smaller vectors.
> 
> I've benchmarked this on a Haswell machine with SPEC 2006
> confirming the regression is fixed and re-benchmarked
> appearant regressions with 3 runs confirming that was noise
> and we end up with maybe even a progression there
> (see the bugzilla audit-trail for details).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?
> 
> Note I'm going to apply as two revisions to allow bisection
> between the two changes, first pushing pessimizing strided
> stores and then adjusting the factor.
> 
> Thanks,
> Richard.
> 
> 2019-03-15  Richard Biener  <rguent...@suse.de>
> 
>       PR target/87561
>       * config/i386/i386.c (ix86_add_stmt_cost): Apply strided
>       load pessimization to stores as well.
>       * config/i386/i386.c (ix86_add_stmt_cost): Pessimize strided
>       loads and stores a bit more.


Looks good to me.  Store costs are even more iffy than other costs
because they are not part of dependency chain,so I guess whatever seems
to work best in practice is good.

Honza

Re: [PATCH] Fix PR87561, 416.gamess slowdown

Reply via email to