On Thu, 20 Dec 2012, Richard Biener wrote: > On Thu, 20 Dec 2012, Richard Biener wrote: > > > On Thu, 20 Dec 2012, Jakub Jelinek wrote: > > > > > On Thu, Dec 20, 2012 at 02:51:55PM +0100, Richard Biener wrote: > > > > In the PR we perform expression replacement of an FP operation > > > > across a builtin call that sets the FP control register. This > > > > patch restricts replacement across calls further, from allowing > > > > all builtins to only allowing those without side-effects. > > > > > > > > Allowing replacement over calls at all was to not pessimize > > > > FP code generation for example for sqrt which is most often > > > > expanded to a single instruction. > > > > > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > > > > > Comments? > > > > > > Wouldn't it be better to have there a list of known builtins over which it > > > is fine to do TER? I'd bet most of memory or string builtins that don't > > > call malloc/free should be still ok, but they surely have side-effects. > > > > I'm not sure - the original reason was that replacing across calls > > made us spill more because there was a call. We agreed that replacing > > across calls isn't usually a good idea but put in the (admittedly bad) > > workaround to still allow doing so across likely-not-calls. > > string builtins generally will expand to calls though. > > > > I was thinking of even making it stronger and increment "cur_call_cnt" > > when the stmt (even non-call) has side-effects (would for example > > cover volatile asms or general volatile touching insns).
After discussing on IRC I am testing the following which adds a target hook and just treats ldmxcsr and stmxcsr differently as well as all volatile asms and internal functions. Bootstrap & regtest on x86_64-unknown-linux-gnu running. Ok for trunk? Thanks, Richard. 2012-12-20 Richard Biener <rguent...@suse.de> PR middle-end/55752 * target.def (sched): Add scheduling_barrier_p. * targhooks.c (default_scheduling_barrier_p): New function. * targhooks.h (default_scheduling_barrier_p): Declare. * doc/tm.texi.in (TARGET_SCHED_SCHEDULING_BARRIER_P): Add. * doc/tm.texi: Update. * tree-ssa-ter.c: Include target.h. (find_replaceable_in_bb): Do not schedule across volatile asms or stmts the target thinks are scheduling barriers. Do not treat internal functions as scheduling barrier by default. * i386/i386.c (TARGET_SCHED_SCHEDULING_BARRIER_P): Override. (ix86_scheduling_barrier_p): New function. Handle IX86_BUILTIN_LDMXCSR and IX86_BUILTIN_STMXCSR. * Makefile.in (tree-ssa-ter.o): Add $(TARGET_H) dependency. Index: gcc/target.def =================================================================== *** gcc/target.def (revision 194632) --- gcc/target.def (working copy) *************** parallelism required in output calculati *** 939,944 **** --- 939,954 ---- int, (unsigned int opc, enum machine_mode mode), hook_int_uint_mode_1) + /* The following member value is a function that returns whether + the statement is considered a barrier for scheduling. By default + this returns false. */ + DEFHOOK + (scheduling_barrier_p, + "This hook is called by TER to determine whether the statement is\n\ + a scheduling barrier.", + bool, (gimple stmt), + default_scheduling_barrier_p) + HOOK_VECTOR_END (sched) /* Functions relating to vectorization. */ Index: gcc/targhooks.c =================================================================== *** gcc/targhooks.c (revision 194632) --- gcc/targhooks.c (working copy) *************** default_canonicalize_comparison (int *, *** 1547,1550 **** --- 1547,1557 ---- { } + /* Default version of scheduling_barrier_p. */ + bool + default_scheduling_barrier_p (gimple) + { + return false; + } + #include "gt-targhooks.h" Index: gcc/targhooks.h =================================================================== *** gcc/targhooks.h (revision 194632) --- gcc/targhooks.h (working copy) *************** extern const char *default_pch_valid_p ( *** 195,197 **** --- 195,199 ---- extern void default_asm_output_ident_directive (const char*); extern bool default_member_type_forces_blk (const_tree, enum machine_mode); + + extern bool default_scheduling_barrier_p (gimple); Index: gcc/doc/tm.texi.in =================================================================== *** gcc/doc/tm.texi.in (revision 194632) --- gcc/doc/tm.texi.in (working copy) *************** in its second parameter. *** 6737,6742 **** --- 6737,6744 ---- @hook TARGET_SCHED_REASSOCIATION_WIDTH + @hook TARGET_SCHED_SCHEDULING_BARRIER_P + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between Index: gcc/doc/tm.texi =================================================================== *** gcc/doc/tm.texi (revision 194632) --- gcc/doc/tm.texi (working copy) *************** This hook is called by tree reassociator *** 6840,6845 **** --- 6840,6850 ---- parallelism required in output calculations chain. @end deftypefn + @deftypefn {Target Hook} bool TARGET_SCHED_SCHEDULING_BARRIER_P (gimple @var{stmt}) + This hook is called by TER to determine whether the statement is + a scheduling barrier. + @end deftypefn + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between Index: gcc/tree-ssa-ter.c =================================================================== *** gcc/tree-ssa-ter.c (revision 194632) --- gcc/tree-ssa-ter.c (working copy) *************** along with GCC; see the file COPYING3. *** 31,36 **** --- 31,37 ---- #include "dumpfile.h" #include "tree-ssa-live.h" #include "flags.h" + #include "target.h" /* Temporary Expression Replacement (TER) *************** find_replaceable_in_bb (temp_expr_table_ *** 681,692 **** kill_expr (tab, partition); } ! /* Increment counter if this is a non BUILT_IN call. We allow ! replacement over BUILT_IN calls since many will expand to inline ! insns instead of a true call. */ ! if (is_gimple_call (stmt) ! && !((fndecl = gimple_call_fndecl (stmt)) ! && DECL_BUILT_IN (fndecl))) cur_call_cnt++; /* Now see if we are creating a new expression or not. */ --- 682,697 ---- kill_expr (tab, partition); } ! /* Increment counter if this is not a BUILT_IN call or a stmt with ! side-effects. We allow replacement over BUILT_IN calls ! since many will expand to inline insns instead of a true call. */ ! if ((gimple_code (stmt) == GIMPLE_ASM ! && gimple_asm_volatile_p (stmt)) ! || (is_gimple_call (stmt) ! && !gimple_call_internal_p (stmt) ! && !((fndecl = gimple_call_fndecl (stmt)) ! && DECL_BUILT_IN (fndecl))) ! || targetm.sched.scheduling_barrier_p (stmt)) cur_call_cnt++; /* Now see if we are creating a new expression or not. */ Index: gcc/config/i386/i386.c =================================================================== *** gcc/config/i386/i386.c (revision 194632) --- gcc/config/i386/i386.c (working copy) *************** ix86_enum_va_list (int idx, const char * *** 41159,41164 **** --- 41159,41166 ---- #define TARGET_SCHED_ADJUST_PRIORITY ix86_adjust_priority #undef TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK #define TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK ix86_dependencies_evaluation_hook + #undef TARGET_SCHED_SCHEDULING_BARRIER_P + #define TARGET_SCHED_SCHEDULING_BARRIER_P ix86_scheduling_barrier_p /* The size of the dispatch window is the total number of bytes of object code allowed in a window. */ *************** ix86_reassociation_width (unsigned int o *** 41982,41987 **** --- 41984,42014 ---- return res; } + /* Implementation of scheduling_barrier_p. */ + + static bool + ix86_scheduling_barrier_p (gimple stmt) + { + if (!is_gimple_call (stmt)) + return false; + + tree fndecl = gimple_call_fndecl (stmt); + if (!fndecl + || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_MD) + return false; + + switch (DECL_FUNCTION_CODE (fndecl)) + { + case IX86_BUILTIN_LDMXCSR: + case IX86_BUILTIN_STMXCSR: + return true; + + default:; + } + + return false; + } + /* ??? No autovectorization into MMX or 3DNOW until we can reliably place emms and femms instructions. */ Index: gcc/Makefile.in =================================================================== *** gcc/Makefile.in (revision 194632) --- gcc/Makefile.in (working copy) *************** tree-into-ssa.o : tree-into-ssa.c $(TREE *** 2272,2278 **** tree-ssa-ter.o : tree-ssa-ter.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \ $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) \ ! $(GIMPLE_PRETTY_PRINT_H) tree-ssa-coalesce.o : tree-ssa-coalesce.c $(TREE_FLOW_H) $(CONFIG_H) \ $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \ $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) $(HASH_TABLE_H) \ --- 2272,2278 ---- tree-ssa-ter.o : tree-ssa-ter.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \ $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) \ ! $(GIMPLE_PRETTY_PRINT_H) $(TARGET_H) tree-ssa-coalesce.o : tree-ssa-coalesce.c $(TREE_FLOW_H) $(CONFIG_H) \ $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \ $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) $(HASH_TABLE_H) \