[Bug target/80846] auto-vectorized AVX2 horizontal sum should narrow to 128b right away, to be more efficient for Ryzen and Intel

peter at cordes dot ca Tue, 16 Jan 2018 12:48:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846


--- Comment #28 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Richard Biener from comment #27)
> Note that this is deliberately left as-is because the target advertises
> (cheap) support for horizontal reduction.  The vectorizer simply generates
> a single statement for the reduction epilogue:
>  [...]
> so either the target shouldn't tell the vectorizer it supports this or
> it simply needs to expand to better code.  Which means - can you open
> a separate bug for this?

Yes; I was incorrectly assuming the inefficient asm had the same cause as
before.  I agree *this* is fixed, thanks for the explanation of how gcc was
arriving at this sequence.

I'll have a look at the backend canned sequence defs and see if there are any
other sub-optimal ones, or if it was only AVX.

Having canned sequences for different target instruction sets instead of
leaving it to arch-independent code seems like it should be an improvement over
the old design.

[Bug target/80846] auto-vectorized AVX2 horizontal sum should narrow to 128b right away, to be more efficient for Ryzen and Intel

Reply via email to