> Currently unaligned YMM and ZMM load and store costs are cheaper than > aligned which causes the vectorizer to purposely mis-align accesses > by adding an alignment prologue. It looks like the unaligned costs > were simply left untouched from znver3 where they equate the aligned > costs when tweaking aligned costs for znver4. The following makes > the unaligned costs equal to the aligned costs. > > This avoids the miscompile seen in PR115843 but it's of course not > a real fix for the issue uncovered there. But it makes it qualify > as a regression fix. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > OK for trunk and affected branches? It also affects the gcc11 branch
Looks good to me. I think it was my omission. I should remmeber that the costs are there multiple times. Maybe wait for SPEC tester before backporting to branches? Honza > where znver4 support/costs are new for 11.5 and thus it affects the > release candidate. The alternative option is to revert the zen4 > backports or leave the costs broken. > > Thanks, > Richard. > > PR tree-optimization/115843 > * config/i386/x86-tune-costs.h (znver4_cost): Update unaligned > load and store cost from the aligned costs. > --- > gcc/config/i386/x86-tune-costs.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/x86-tune-costs.h > b/gcc/config/i386/x86-tune-costs.h > index a933794ed50..2ac75c35aee 100644 > --- a/gcc/config/i386/x86-tune-costs.h > +++ b/gcc/config/i386/x86-tune-costs.h > @@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = { > in 32bit, 64bit, 128bit, 256bit and > 512bit */ > {8, 8, 8, 12, 12}, /* cost of storing SSE register > in 32bit, 64bit, 128bit, 256bit and > 512bit */ > - {6, 6, 6, 6, 6}, /* cost of unaligned loads. */ > - {8, 8, 8, 8, 8}, /* cost of unaligned stores. */ > + {6, 6, 10, 10, 12}, /* cost of unaligned loads. */ > + {8, 8, 8, 12, 12}, /* cost of unaligned stores. */ > 2, 2, 2, /* cost of moving XMM,YMM,ZMM > register. */ > 6, /* cost of moving SSE register to > integer. */ > -- > 2.35.3