https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902
Bug ID: 77902 Summary: Auto-vectorizes epilogue loops or manually vectorized functions Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: linux at carewolf dot com Target Milestone: --- Created attachment 39774 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39774&action=edit Example that trigger the pointless auto-vectorization A common pattern when manually vectorizing an inner function is to have a small epilogue that handles the remainder of the input vector that cannot be handled by the vectorized stepping. For instance: int i = 0; for (; i < (count - 3); i +=4) // do 4 at a time for (; i < count; ++i) // do 1 at a time When compiled with -O3 or -ftree-loop-vectorize that last epilogue may be auto-vectorized by GCC even though it can at most be run 3 times, and the auto-vectorized code-path will never be called. Rewriting it as int i = 0; for (; i < (count - 3); i +=4) // do 4 at a time for (int _i; _i < 3 && i < count; ++_i, ++i) // do 1 at a time Fixes the issue. I am guessing GCC would do well to learn a range from the main-loop so that it can figure out on its own that the epilogue can not be run more than 3 times.