Hi, As PR92464 shows, the recent vectorization cost adjustment on load insns is responsible for this regression. It leads the profitable min iteration count to change from 19 to 12. The case happens to hit the threshold. By actual runtime performance evaluation, the vectorized version perform on par with non vectorized version (before). So the vectorization on 12 is actually fine. To keep the case sensitive on high peeling cost, this patch is to adjust the loop bound from 16 to 14.
Verified on ppc64-redhat-linux (BE P7) and powerpc64le-linux-gnu (LE P8). BR, Kewen ----- gcc/testsuite/ChangeLog 2019-11-13 Kewen Lin <li...@gcc.gnu.org> PR target/92464 * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust loop bound due to load cost adjustment. diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c index 4a7da2e..1bb064e 100644 --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c @@ -4,7 +4,7 @@ #include <stdarg.h> #include "../../tree-vect.h" -#define N 16 +#define N 14 #define OFF 4 /* Check handling of accesses for which the "initial condition" -