Hi, my core tuning patch has caused regression gcc.target/i386/pr61403.c which I have missed in my testing. The testcase looks for blend instruction which is no longer output. The reason is that the loop is now vectorized with SLP while before my changes the costmodel claimed SLP vectorization is not good and vectorizer disabled it.
I have tested that on skylake, the new code is about 11% faster. The PR itself is only about vectorizing the loop. I am not quite sure what was the intention of the testcase, but perhaps we can just check that there is vectorized sqrt that is output in any case? Honza Index: ../../gcc/testsuite/gcc.target/i386/pr61403.c =================================================================== --- ../../gcc/testsuite/gcc.target/i386/pr61403.c (revision 253935) +++ ../../gcc/testsuite/gcc.target/i386/pr61403.c (working copy) @@ -23,4 +23,4 @@ norm (struct XYZ *in, struct XYZ *out, i } } -/* { dg-final { scan-assembler "blend" } } */ +/* { dg-final { scan-assembler "rsqrtps" } } */