On Sun, Nov 11, 2012 at 7:36 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> Regarding vzeroupper insertion pass - we will use gcc pass manager to > insert a target-dependant pass directly after reload ... ... like attached patch. The patch inserts vzeroupper pass directly after reload, so spills from 256bit registers are considered when processing AVX_U128 entity. The patched gcc reruns mode-switching pass, so an export of entry function from mode-switching is needed. 2012-11-10 Uros Bizjak <ubiz...@gmail.com> Vladimir Yakovlev <vladimir.b.yakov...@intel.com> PR target/47440 * config/i386/i386.c (struct rtl_opt_pass pass_insert_vzeroupper): New. (gate_insert_vzeroupper): New function. (rest_of_handle_insert_vzeroupper): Ditto. (ix86_option_override): Register vzeroupper insertion pass here. (ix86_init_machine_status): Remove optimize_mode_switching[AVX_U128] initialization. * mode-switching.c (optimize_mode_switching): Export. * rtl.h (optimize_mode_switching): Declare prototype. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX target. Functionally equivalent patch was tested on SPEC2000/2006 by Vladimir. I will wait a day or two for possible comments. I guess that non-algorithmic change to mode-switching doesn't need an approval... Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 193409) +++ config/i386/i386.c (working copy) @@ -2301,6 +2301,51 @@ static const char *const cpu_names[TARGET_CPU_DEFA "btver2" }; +static bool +gate_insert_vzeroupper (void) +{ + return TARGET_VZEROUPPER; +} + +static unsigned int +rest_of_handle_insert_vzeroupper (void) +{ + int i; + + /* vzeroupper instructions are inserted immediately after reload to + account for possible spills from 256bit registers. The pass + reuses mode switching infrastructure by re-running mode insertion + pass, so disable entities that have already been processed. */ + for (i = 0; i < MAX_386_ENTITIES; i++) + ix86_optimize_mode_switching[i] = 0; + + ix86_optimize_mode_switching[AVX_U128] = 1; + + optimize_mode_switching (); + return 0; +} + +struct rtl_opt_pass pass_insert_vzeroupper = +{ + { + RTL_PASS, + "vzeroupper", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + gate_insert_vzeroupper, /* gate */ + rest_of_handle_insert_vzeroupper, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_df_finish | TODO_verify_rtl_sharing | + 0, /* todo_flags_finish */ + } +}; + /* Return true if a red-zone is in use. */ static inline bool @@ -3705,7 +3750,16 @@ ix86_option_override_internal (bool main_args_p) static void ix86_option_override (void) { + static struct register_pass_info insert_vzeroupper_info + = { &pass_insert_vzeroupper.pass, "reload", + 1, PASS_POS_INSERT_AFTER + }; + ix86_option_override_internal (true); + + + /* This needs to be done at start up. It's convenient to do it here. */ + register_pass (&insert_vzeroupper_info); } /* Update register usage after having seen the compiler flags. */ @@ -23406,7 +23460,6 @@ ix86_init_machine_status (void) f = ggc_alloc_cleared_machine_function (); f->use_fast_prologue_epilogue_nregs = -1; f->call_abi = ix86_abi; - f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER; return f; } Index: mode-switching.c =================================================================== --- mode-switching.c (revision 193407) +++ mode-switching.c (working copy) @@ -447,7 +447,7 @@ create_pre_exit (int n_entities, int *entity_map, /* Find all insns that need a particular mode setting, and insert the necessary mode switches. Return true if we did work. */ -static int +int optimize_mode_switching (void) { rtx insn; Index: rtl.h =================================================================== --- rtl.h (revision 193407) +++ rtl.h (working copy) @@ -2719,6 +2719,9 @@ extern rtx get_reg_base_value (unsigned int); extern int stack_regs_mentioned (const_rtx insn); #endif +/* In mode-switching.c */ +extern int optimize_mode_switching (void); + /* In toplev.c */ extern GTY(()) rtx stack_limit_rtx;