https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #8 from sergey.shalnov at intel dot com --- Richard, This is great changes and I see the first loop became vectorized for the test example I provided with gcc-8.0 main trunk. But I think the issue a bit more complicated. Vectorization of the first loop just hide the issue I reported. Currently I see following: test loop: “for (int i = 0; i < 4; i++, input1 += 4, input2 += 4)” test_bugzilla.c:6:5: note: Cost model analysis:. Vector inside of loop cost: 1136 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 328 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 test_bugzilla.c:6:5: note: Runtime profitability threshold = 4 test_bugzilla.c:6:5: note: Static estimate profitability threshold = 4 test_bugzilla.c:6:5: note: loop vectorized if I slightly change the loop (to be closer to real application): “for (int i = 0; i < 4; i++, input1 += stride1, input2 += stride2)” test_bugzilla1.c:6:5: note: Cost model analysis:. Vector inside of loop cost: 5232 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 328 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 test_bugzilla1.c:6:5: note: cost model: the vector iteration cost = 5232 divided by the scalar iteration cost = 328 is greater or equal to the vectorization factor = 4. test_bugzilla1.c:6:5: note: not vectorized: vectorization not profitable. test_bugzilla1.c:6:5: note: not vectorized: vector version will never be profitable. And the issue with extra vector operations remains the same. I’m not sure but I think it is really profitable to avoid vector registers usage if the loop is not vectorized. Do you agree? Sergey