Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply left untouched from znver3 where they equate the aligned costs when tweaking aligned costs for znver4. The following makes the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not a real fix for the issue uncovered there. But it makes it qualify as a regression fix. Bootstrap & regtest running on x86_64-unknown-linux-gnu. OK for trunk and affected branches? It also affects the gcc11 branch where znver4 support/costs are new for 11.5 and thus it affects the release candidate. The alternative option is to revert the zen4 backports or leave the costs broken. Thanks, Richard. PR tree-optimization/115843 * config/i386/x86-tune-costs.h (znver4_cost): Update unaligned load and store cost from the aligned costs. --- gcc/config/i386/x86-tune-costs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index a933794ed50..2ac75c35aee 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1924,8 +1924,8 @@ struct processor_costs znver4_cost = { in 32bit, 64bit, 128bit, 256bit and 512bit */ {8, 8, 8, 12, 12}, /* cost of storing SSE register in 32bit, 64bit, 128bit, 256bit and 512bit */ - {6, 6, 6, 6, 6}, /* cost of unaligned loads. */ - {8, 8, 8, 8, 8}, /* cost of unaligned stores. */ + {6, 6, 10, 10, 12}, /* cost of unaligned loads. */ + {8, 8, 8, 12, 12}, /* cost of unaligned stores. */ 2, 2, 2, /* cost of moving XMM,YMM,ZMM register. */ 6, /* cost of moving SSE register to integer. */ -- 2.35.3