https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78114
--- Comment #5 from amker at gcc dot gnu.org --- It's because the loop is vectorized by vf=2 with -mavx2, while by vf=4 with -march=haswell. In that case the peeled prolog iterates more than 1 times, resulting in test failure.