On Tue, 7 May 2013, Richard Biener wrote: > > The following patch is a first step towards being able to enable > vectorizing of a subset of all vectorizable functions at -O2 by > default. Analysis of Polyhedron (loop heavy code) shows that > the cost of doing vectorizer analysis is in the noise but compile-time > (and binary size) grows only with the number of loops we emit > vectorized code for (because we generate up to 3 extra loops for > each vectorizable loop that we have to move down the pass pipeline). > > This very first patch makes sure that a runtime cost model check > comes first - and not after alias or alignment versioning checks. > That part of the patch would be a no-op without the rest because > currently peeling for alignment and versioning cannot coexist > (well - they could not, before I introduced > STMT_VINFO_LOOP_PHI_EVOLUTION_PART a few releases ago ...). Thus > the patch enables doing both which may eventually speedup things. > > Sofar tested on the testsuite and polyhedron, full bootstrap and > regtest is underway. > > Patches in this series will transform -fvect-cost-model to get > an additional parameter (not sure how that will end up looking like) > to be able to control vectorization in the following 'steps' > > 1 vectorize loops that do not require a runtime cost check to be > profitable and that do not require versioning (to be enabled at -O2) > 2 vectorize loops like we do now > 3 vectorize loops like we do now but assume the runtime cost check > will always succeed (thus, omit it) [-fno-vect-cost-model] > > I'm not sure yet if restricting the versioning makes sense > (it's supposed to reduce code bloat and compile-time of course), > esp. considering that peeling for alignment could be disabled as well > (but then HW without unaligned accesses will likely vectorize nothing). > Thus the complication seems to be the code size considerations > (the cost model is currently set up to compare runtime cost only). > > Richard. > > 2013-05-07 Richard Biener <rguent...@suse.de> > > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not > disable peeling when we version for aliasing. > * tree-vect-loop-manip.c (vect_can_advance_ivs_p): Use > STMT_VINFO_LOOP_PHI_EVOLUTION_PART instead of recomputing it. > * tree-vect-loop.c (vect_transform_loop): First apply versioning, > then peeling to arrange for the cost-model check to come first.
Runs into issues with gcc.target/i386/l_fma_double_1.c and friends because that now does peeling for alignment instead of just versioning for alias. And peeling for alignment is only applied on x86_64 for double because on i686 alignment might not be reachable with peeling. Trying to fixup in the testcase with a properly aligned double type shows that we don't honor user alignment in the vectorizer for this purpose - thus, fixed, to allow not too ugly adjustments of the fma testcases. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-05-10 Richard Biener <rguent...@suse.de> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not disable peeling when we version for aliasing. (vector_alignment_reachable_p): Honor explicit user alignment. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop-manip.c (vect_can_advance_ivs_p): Use STMT_VINFO_LOOP_PHI_EVOLUTION_PART instead of recomputing it. * tree-vect-loop.c (vect_transform_loop): First apply versioning, then peeling to arrange for the cost-model check to come first. * gcc.target/i386/avx256-unaligned-load-2.c: Make well-defined. * gcc.target/i386/l_fma_double_1.c: Adjust. * gcc.target/i386/l_fma_double_2.c: Likewise. * gcc.target/i386/l_fma_double_3.c: Likewise. * gcc.target/i386/l_fma_double_4.c: Likewise. * gcc.target/i386/l_fma_double_5.c: Likewise. * gcc.target/i386/l_fma_double_6.c: Likewise. * gcc.target/i386/l_fma_float_1.c: Likewise. * gcc.target/i386/l_fma_float_2.c: Likewise. * gcc.target/i386/l_fma_float_3.c: Likewise. * gcc.target/i386/l_fma_float_4.c: Likewise. * gcc.target/i386/l_fma_float_5.c: Likewise. * gcc.target/i386/l_fma_float_6.c: Likewise. Index: gcc/tree-vect-data-refs.c =================================================================== *** gcc/tree-vect-data-refs.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/tree-vect-data-refs.c 2013-05-08 14:23:42.937265437 +0200 *************** vector_alignment_reachable_p (struct dat *** 1024,1030 **** if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Unknown misalignment, is_packed = %d",is_packed); ! if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) return true; else return false; --- 1024,1031 ---- if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Unknown misalignment, is_packed = %d",is_packed); ! if ((TYPE_USER_ALIGN (type) && !is_packed) ! || targetm.vectorize.vector_alignment_reachable (type, is_packed)) return true; else return false; *************** vect_enhance_data_refs_alignment (loop_v *** 1323,1329 **** bool stat; gimple stmt; stmt_vec_info stmt_info; - int vect_versioning_for_alias_required; unsigned int npeel = 0; bool all_misalignments_unknown = true; unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); --- 1324,1329 ---- *************** vect_enhance_data_refs_alignment (loop_v *** 1510,1524 **** } } ! vect_versioning_for_alias_required ! = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo); ! ! /* Temporarily, if versioning for alias is required, we disable peeling ! until we support peeling and versioning. Often peeling for alignment ! will require peeling for loop-bound, which in turn requires that we ! know how to adjust the loop ivs after the loop. */ ! if (vect_versioning_for_alias_required ! || !vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) do_peeling = false; --- 1510,1517 ---- } } ! /* Check if we can possibly peel the loop. */ ! if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) do_peeling = false; *************** vect_supportable_dr_alignment (struct da *** 4722,4730 **** if (!known_alignment_for_access_p (dr)) is_packed = not_size_aligned (DR_REF (dr)); ! if (targetm.vectorize. ! support_vector_misalignment (mode, type, ! DR_MISALIGNMENT (dr), is_packed)) /* Can't software pipeline the loads, but can at least do them. */ return dr_unaligned_supported; } --- 4715,4724 ---- if (!known_alignment_for_access_p (dr)) is_packed = not_size_aligned (DR_REF (dr)); ! if ((TYPE_USER_ALIGN (type) && !is_packed) ! || targetm.vectorize. ! support_vector_misalignment (mode, type, ! DR_MISALIGNMENT (dr), is_packed)) /* Can't software pipeline the loads, but can at least do them. */ return dr_unaligned_supported; } *************** vect_supportable_dr_alignment (struct da *** 4736,4744 **** if (!known_alignment_for_access_p (dr)) is_packed = not_size_aligned (DR_REF (dr)); ! if (targetm.vectorize. ! support_vector_misalignment (mode, type, ! DR_MISALIGNMENT (dr), is_packed)) return dr_unaligned_supported; } --- 4730,4739 ---- if (!known_alignment_for_access_p (dr)) is_packed = not_size_aligned (DR_REF (dr)); ! if ((TYPE_USER_ALIGN (type) && !is_packed) ! || targetm.vectorize. ! support_vector_misalignment (mode, type, ! DR_MISALIGNMENT (dr), is_packed)) return dr_unaligned_supported; } Index: gcc/tree-vect-loop.c =================================================================== *** gcc/tree-vect-loop.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/tree-vect-loop.c 2013-05-08 13:33:14.445119190 +0200 *************** vect_transform_loop (loop_vec_info loop_ *** 5499,5517 **** check_profitability = true; } ! /* Peel the loop if there are data refs with unknown alignment. ! Only one data ref with unknown store is allowed. */ ! if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)) { ! vect_do_peeling_for_alignment (loop_vinfo, th, check_profitability); check_profitability = false; } ! if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo) ! || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) { ! vect_loop_versioning (loop_vinfo, th, check_profitability); check_profitability = false; } --- 5499,5520 ---- check_profitability = true; } ! /* Version the loop first, if required, so the profitability check ! comes first. */ ! if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo) ! || LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) { ! vect_loop_versioning (loop_vinfo, th, check_profitability); check_profitability = false; } ! /* Peel the loop if there are data refs with unknown alignment. ! Only one data ref with unknown store is allowed. */ ! ! if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)) { ! vect_do_peeling_for_alignment (loop_vinfo, th, check_profitability); check_profitability = false; } Index: gcc/tree-vect-loop-manip.c =================================================================== *** gcc/tree-vect-loop-manip.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/tree-vect-loop-manip.c 2013-05-08 13:33:14.446119201 +0200 *************** vect_can_advance_ivs_p (loop_vec_info lo *** 1555,1561 **** dump_printf_loc (MSG_NOTE, vect_location, "vect_can_advance_ivs_p:"); for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { - tree access_fn = NULL; tree evolution_part; phi = gsi_stmt (gsi); --- 1555,1560 ---- *************** vect_can_advance_ivs_p (loop_vec_info lo *** 1588,1618 **** /* Analyze the evolution function. */ ! access_fn = instantiate_parameters ! (loop, analyze_scalar_evolution (loop, PHI_RESULT (phi))); ! ! if (!access_fn) ! { ! if (dump_enabled_p ()) ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, ! "No Access function."); ! return false; ! } ! ! STRIP_NOPS (access_fn); ! if (dump_enabled_p ()) ! { ! dump_printf_loc (MSG_NOTE, vect_location, ! "Access function of PHI: "); ! dump_generic_expr (MSG_NOTE, TDF_SLIM, access_fn); ! } ! ! evolution_part = evolution_part_in_loop_num (access_fn, loop->num); ! if (evolution_part == NULL_TREE) { if (dump_enabled_p ()) ! dump_printf (MSG_MISSED_OPTIMIZATION, "No evolution."); return false; } --- 1587,1599 ---- /* Analyze the evolution function. */ ! evolution_part ! = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (vinfo_for_stmt (phi)); if (evolution_part == NULL_TREE) { if (dump_enabled_p ()) ! dump_printf (MSG_MISSED_OPTIMIZATION, ! "No access function or evolution."); return false; } Index: gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c =================================================================== *** gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c 2013-05-08 13:33:14.446119201 +0200 *************** *** 1,26 **** /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */ - #define N 1024 - - char **ep; - char **fp; - void ! avx_test (void) { int i; ! char **ap; ! char **bp; ! char **cp; ! ! ap = ep; ! bp = fp; ! for (i = 128; i >= 0; i--) ! { ! *ap++ = *cp++; ! *bp++ = 0; ! } } /* { dg-final { scan-assembler-not "avx_loaddqu256" } } */ --- 1,13 ---- /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */ void ! avx_test (char **cp, char **ep) { int i; ! char **ap = __builtin_assume_aligned (ep, 32); ! for (i = 128; i > 0; i--) ! *ap++ = *cp++; } /* { dg-final { scan-assembler-not "avx_loaddqu256" } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_1.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_1.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_1.c 2013-05-08 14:05:06.520670173 +0200 *************** *** 4,26 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213sd" 16 } } */ --- 4,27 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_2.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_2.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_2.c 2013-05-08 14:24:27.513768881 +0200 *************** *** 4,10 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_2.h" --- 4,11 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_2.h" *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32 } } */ --- 13,19 ---- /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_3.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_3.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_3.c 2013-05-08 14:24:43.541949905 +0200 *************** *** 4,26 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213sd" 16 } } */ --- 4,27 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 28 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213sd" 28 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_4.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_4.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_4.c 2013-05-08 14:24:47.562995313 +0200 *************** *** 4,10 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_4.h" --- 4,11 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_4.h" *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32 } } */ --- 13,19 ---- /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_5.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_5.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_5.c 2013-05-08 14:24:54.507073710 +0200 *************** *** 4,10 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_5.h" --- 4,11 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_5.h" *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32 } } */ --- 13,19 ---- /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_double_6.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_double_6.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_double_6.c 2013-05-08 14:24:57.838111351 +0200 *************** *** 4,10 **** /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! #define TYPE double #include "l_fma_6.h" --- 4,11 ---- /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ ! typedef double adouble __attribute__((aligned(sizeof (double)))); ! #define TYPE adouble #include "l_fma_6.h" *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 32 } } */ --- 13,19 ---- /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132sd" 56 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132sd" 56 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_1.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_1.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_1.c 2013-05-08 13:33:14.447119212 +0200 *************** *** 9,26 **** #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213ss" 32 } } */ --- 9,26 ---- #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213ss" 60 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_2.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_2.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_2.c 2013-05-08 13:33:14.448119223 +0200 *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64 } } */ --- 12,18 ---- /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_3.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_3.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_3.c 2013-05-08 13:33:14.448119223 +0200 *************** *** 9,26 **** #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 32 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213ss" 32 } } */ --- 9,26 ---- #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfmsub213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd213ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 60 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub213ss" 60 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_4.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_4.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_4.c 2013-05-08 13:33:14.448119223 +0200 *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64 } } */ --- 12,18 ---- /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_5.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_5.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_5.c 2013-05-08 13:33:14.448119223 +0200 *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64 } } */ --- 12,18 ---- /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120 } } */ Index: gcc/testsuite/gcc.target/i386/l_fma_float_6.c =================================================================== *** gcc/testsuite/gcc.target/i386/l_fma_float_6.c.orig 2013-05-08 13:26:16.000000000 +0200 --- gcc/testsuite/gcc.target/i386/l_fma_float_6.c 2013-05-08 13:33:14.448119223 +0200 *************** *** 12,18 **** /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 64 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 64 } } */ --- 12,18 ---- /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ ! /* { dg-final { scan-assembler-times "vfmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfmsub132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmadd132ss" 120 } } */ ! /* { dg-final { scan-assembler-times "vfnmsub132ss" 120 } } */