https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-07-06 Blocks| |53947 Status|UNCONFIRMED |NEW CC| |avieira at gcc dot gnu.org, | |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- In the end it is indeed a costing issue (also finding SLP sequences from reductions is quite ad-hoc - either all reductions form a SLP sequence or none). There's epilogue cost which for SLP reductions is usually cheaper than from reduction chains and then there's cost of the participating loads and required permutations which depends very much on the actual case ... For the immediate benefit I think giving more control to the user sometimes makes sense and if then I'd go a route like #pragma GCC vect [no-]reduc-chain and document those as hints. But as you say, basing the decision on costing would be way better. Note ILP for the reduction chain is probably higher since both reductions can execute in parallel, so for the simple testcase I'd expect the reduction chain variant to be faster. Note for some reason your testcase vectorizes as a SLP reduction and not as reduction chains for me on x86_64, association seems off vectorizers expectation. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations