> Currently unaligned YMM and ZMM load and store costs are cheaper than
> aligned which causes the vectorizer to purposely mis-align accesses
> by adding an alignment prologue.  It looks like the unaligned costs
> were simply left untouched from znver3 where they equate the aligned
> costs when tweaking aligned costs for znver4.  The following makes
> the unaligned costs equal to the aligned costs.
> 
> This avoids the miscompile seen in PR115843 but it's of course not
> a real fix for the issue uncovered there.  But it makes it qualify
> as a regression fix.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> OK for trunk and affected branches?  It also affects the gcc11 branch

Looks good to me.  I think it was my omission.  I should remmeber that
the costs are there multiple times.

Maybe wait for SPEC tester before backporting to branches?
Honza
> where znver4 support/costs are new for 11.5 and thus it affects the
> release candidate.  The alternative option is to revert the zen4
> backports or leave the costs broken.
> 
> Thanks,
> Richard.
> 
>       PR tree-optimization/115843
>       * config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
>       load and store cost from the aligned costs.
> ---
>  gcc/config/i386/x86-tune-costs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/x86-tune-costs.h 
> b/gcc/config/i386/x86-tune-costs.h
> index a933794ed50..2ac75c35aee 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = {
>                                          in 32bit, 64bit, 128bit, 256bit and 
> 512bit */
>    {8, 8, 8, 12, 12},                 /* cost of storing SSE register
>                                          in 32bit, 64bit, 128bit, 256bit and 
> 512bit */
> -  {6, 6, 6, 6, 6},                   /* cost of unaligned loads.  */
> -  {8, 8, 8, 8, 8},                   /* cost of unaligned stores.  */
> +  {6, 6, 10, 10, 12},                        /* cost of unaligned loads.  */
> +  {8, 8, 8, 12, 12},                 /* cost of unaligned stores.  */
>    2, 2, 2,                           /* cost of moving XMM,YMM,ZMM
>                                          register.  */
>    6,                                 /* cost of moving SSE register to 
> integer.  */
> -- 
> 2.35.3

Reply via email to