https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846
--- Comment #28 from Peter Cordes <peter at cordes dot ca> --- (In reply to Richard Biener from comment #27) > Note that this is deliberately left as-is because the target advertises > (cheap) support for horizontal reduction. The vectorizer simply generates > a single statement for the reduction epilogue: > [...] > so either the target shouldn't tell the vectorizer it supports this or > it simply needs to expand to better code. Which means - can you open > a separate bug for this? Yes; I was incorrectly assuming the inefficient asm had the same cause as before. I agree *this* is fixed, thanks for the explanation of how gcc was arriving at this sequence. I'll have a look at the backend canned sequence defs and see if there are any other sub-optimal ones, or if it was only AVX. Having canned sequences for different target instruction sets instead of leaving it to arch-independent code seems like it should be an improvement over the old design.