On Wed, Sep 26, 2018 at 8:11 PM H.J. Lu <hongjiu...@intel.com> wrote: > > Add -mzero-caller-saved-regs=[skip|used|all] command-line option and > zero_caller_saved_regs("skip|used|all") function attribue: > > 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip") > > Don't zero caller-saved integer registers upon function return. > > 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used") > > Zero used caller-saved integer registers upon function return. > > 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all") > > Zero all caller-saved integer registers upon function return. > > Tested on i686 and x86-64 with bootstrapping GCC trunk and > -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all > enabled by default.
Can this be done in a target independet way? Richard. > gcc/ > > * config/i386/i386-opts.h (zero_caller_saved_regs): New enum. > * config/i386/i386-protos.h (ix86_split_simple_return_pop_internal): > Renamed to ... > (ix86_split_simple_return_internal): This. > * config/i386/i386.c (ix86_set_zero_caller_saved_regs_type): New > function. > (ix86_set_current_function): Call > ix86_set_zero_caller_saved_regs_type. > (ix86_expand_prologue): Replace gen_prologue_use with > gen_pro_epilogue_use. > (ix86_expand_epilogue): Replace gen_simple_return_pop_internal > with ix86_split_simple_return_internal. Replace > gen_simple_return_internal with ix86_split_simple_return_internal. > (ix86_find_live_outgoing_regs): New function. > (ix86_split_simple_return_pop_internal): Removed. > (ix86_split_simple_return_internal): New function. > (ix86_handle_fndecl_attribute): Support zero_caller_saved_regs > attribute. > (ix86_attribute_table): Add zero_caller_saved_regs. > * config/i386/i386.h (machine_function): Add > zero_caller_saved_regs_type and live_outgoing_regs. > (TARGET_POP_SCRATCH_REGISTER): New. > * config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC. > (UNSPECV_PROLOGUE_USE): Renamed to ... > (UNSPECV_PRO_EPILOGUE_USE): This. > (prologue_use): Renamed to ... > (pro_epilogue_use): This. > (simple_return_internal): Changed to define_insn_and_split. > (simple_return_internal_1): New pattern. > (simple_return_pop_internal): Replace > ix86_split_simple_return_pop_internal with > ix86_split_simple_return_internal. Always call > ix86_split_simple_return_internal if epilogue_completed is > true. > (simple_return_pop_internal_1): New pattern. > (Epilogue deallocator to pop peepholes): Enabled only if > TARGET_POP_SCRATCH_REGISTER is true. > * config/i386/i386.opt (mzero-caller-saved-regs=): New option. > * doc/extend.texi: Document zero_caller_saved_regs attribute. > * doc/invoke.texi: Document -mzero-caller-saved-regs=. > > gcc/testsuite/ > > * gcc.target/i386/zero-scratch-regs-1.c: New test. > * gcc.target/i386/zero-scratch-regs-2.c: Likewise. > * gcc.target/i386/zero-scratch-regs-3.c: Likewise. > * gcc.target/i386/zero-scratch-regs-4.c: Likewise. > * gcc.target/i386/zero-scratch-regs-5.c: Likewise. > * gcc.target/i386/zero-scratch-regs-6.c: Likewise. > * gcc.target/i386/zero-scratch-regs-7.c: Likewise. > * gcc.target/i386/zero-scratch-regs-8.c: Likewise. > * gcc.target/i386/zero-scratch-regs-9.c: Likewise. > * gcc.target/i386/zero-scratch-regs-10.c: Likewise. > * gcc.target/i386/zero-scratch-regs-11.c: Likewise. > * gcc.target/i386/zero-scratch-regs-12.c: Likewise. > --- > gcc/config/i386/i386-opts.h | 7 + > gcc/config/i386/i386-protos.h | 2 +- > gcc/config/i386/i386.c | 245 ++++++++++++++++-- > gcc/config/i386/i386.h | 13 + > gcc/config/i386/i386.md | 54 +++- > gcc/config/i386/i386.opt | 17 ++ > gcc/doc/extend.texi | 8 + > gcc/doc/invoke.texi | 12 +- > .../gcc.target/i386/zero-scratch-regs-1.c | 10 + > .../gcc.target/i386/zero-scratch-regs-10.c | 19 ++ > .../gcc.target/i386/zero-scratch-regs-11.c | 39 +++ > .../gcc.target/i386/zero-scratch-regs-12.c | 39 +++ > .../gcc.target/i386/zero-scratch-regs-2.c | 17 ++ > .../gcc.target/i386/zero-scratch-regs-3.c | 10 + > .../gcc.target/i386/zero-scratch-regs-4.c | 12 + > .../gcc.target/i386/zero-scratch-regs-5.c | 18 ++ > .../gcc.target/i386/zero-scratch-regs-6.c | 12 + > .../gcc.target/i386/zero-scratch-regs-7.c | 11 + > .../gcc.target/i386/zero-scratch-regs-8.c | 17 ++ > .../gcc.target/i386/zero-scratch-regs-9.c | 13 + > 20 files changed, 538 insertions(+), 37 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c > create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h > index 46366cbfa72..7f9a92e7e5b 100644 > --- a/gcc/config/i386/i386-opts.h > +++ b/gcc/config/i386/i386-opts.h > @@ -119,4 +119,11 @@ enum indirect_branch { > indirect_branch_thunk_extern > }; > > +enum zero_caller_saved_regs { > + zero_caller_saved_regs_unset = 0, > + zero_caller_saved_regs_skip, > + zero_caller_saved_regs_used, > + zero_caller_saved_regs_all > +}; > + > #endif > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h > index d1d59633dc0..a92f34a48b1 100644 > --- a/gcc/config/i386/i386-protos.h > +++ b/gcc/config/i386/i386-protos.h > @@ -310,7 +310,7 @@ extern const char * ix86_output_call_insn (rtx_insn > *insn, rtx call_op); > extern const char * ix86_output_indirect_jmp (rtx call_op); > extern const char * ix86_output_function_return (bool long_p); > extern const char * ix86_output_indirect_function_return (rtx ret_op); > -extern void ix86_split_simple_return_pop_internal (rtx); > +extern void ix86_split_simple_return_internal (rtx); > extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load, > machine_mode mode); > extern int ix86_min_insn_size (rtx_insn *); > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ef72219f165..359062e6f44 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -5561,6 +5561,40 @@ ix86_set_func_type (tree fndecl) > } > } > > +/* Set the zero_caller_saved_regs_type field from the function FNDECL. */ > + > +static void > +ix86_set_zero_caller_saved_regs_type (tree fndecl) > +{ > + if (cfun->machine->zero_caller_saved_regs_type > + == zero_caller_saved_regs_unset) > + { > + tree attr = lookup_attribute ("zero_caller_saved_regs", > + DECL_ATTRIBUTES (fndecl)); > + if (attr != NULL) > + { > + tree args = TREE_VALUE (attr); > + if (args == NULL) > + gcc_unreachable (); > + tree cst = TREE_VALUE (args); > + if (strcmp (TREE_STRING_POINTER (cst), "skip") == 0) > + cfun->machine->zero_caller_saved_regs_type > + = zero_caller_saved_regs_skip; > + else if (strcmp (TREE_STRING_POINTER (cst), "used") == 0) > + cfun->machine->zero_caller_saved_regs_type > + = zero_caller_saved_regs_used; > + else if (strcmp (TREE_STRING_POINTER (cst), "all") == 0) > + cfun->machine->zero_caller_saved_regs_type > + = zero_caller_saved_regs_all; > + else > + gcc_unreachable (); > + } > + else > + cfun->machine->zero_caller_saved_regs_type > + = ix86_zero_caller_saved_regs; > + } > +} > + > /* Set the indirect_branch_type field from the function FNDECL. */ > > static void > @@ -5661,6 +5695,7 @@ ix86_set_current_function (tree fndecl) > { > ix86_set_func_type (fndecl); > ix86_set_indirect_branch_type (fndecl); > + ix86_set_zero_caller_saved_regs_type (fndecl); > } > return; > } > @@ -5682,6 +5717,7 @@ ix86_set_current_function (tree fndecl) > > ix86_set_func_type (fndecl); > ix86_set_indirect_branch_type (fndecl); > + ix86_set_zero_caller_saved_regs_type (fndecl); > > tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl); > if (new_tree == NULL_TREE) > @@ -13542,7 +13578,7 @@ ix86_expand_prologue (void) > insn = emit_insn (gen_set_got (pic)); > RTX_FRAME_RELATED_P (insn) = 1; > add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX); > - emit_insn (gen_prologue_use (pic)); > + emit_insn (gen_pro_epilogue_use (pic)); > /* Deleting already emmitted SET_GOT if exist and allocated to > REAL_PIC_OFFSET_TABLE_REGNUM. */ > ix86_elim_entry_set_got (pic); > @@ -13571,7 +13607,7 @@ ix86_expand_prologue (void) > Further, prevent alloca modifications to the stack pointer from being > combined with prologue modifications. */ > if (TARGET_SEH) > - emit_insn (gen_prologue_use (stack_pointer_rtx)); > + emit_insn (gen_pro_epilogue_use (stack_pointer_rtx)); > } > > /* Emit code to restore REG using a POP insn. */ > @@ -14289,7 +14325,7 @@ ix86_expand_epilogue (int style) > emit_jump_insn (gen_simple_return_indirect_internal (ecx)); > } > else > - emit_jump_insn (gen_simple_return_pop_internal (popc)); > + ix86_split_simple_return_internal (popc); > } > else if (!m->call_ms2sysv || !restore_stub_is_tail) > { > @@ -14316,7 +14352,7 @@ ix86_expand_epilogue (int style) > emit_jump_insn (gen_simple_return_indirect_internal (ecx)); > } > else > - emit_jump_insn (gen_simple_return_internal ()); > + ix86_split_simple_return_internal (NULL_RTX); > } > > /* Restore the state back to the state from the prologue, > @@ -28402,37 +28438,169 @@ ix86_output_indirect_function_return (rtx ret_op) > return "%!jmp\t%A0"; > } > > -/* Split simple return with popping POPC bytes from stack to indirect > - branch with stack adjustment . */ > +/* Find general registers which are live at the exit of basic block BB > + and set their corresponding bits in LIVE_OUTGOING_REGS. */ > + > +static void > +ix86_find_live_outgoing_regs (basic_block bb, > + unsigned int &live_outgoing_regs) > +{ > + bitmap live_out = df_get_live_out (bb); > + > + bool zero_all = (cfun->machine->zero_caller_saved_regs_type > + == zero_caller_saved_regs_all); > + > + unsigned int regno; > + > + /* Check for live outgoing registers. */ > + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > + { > + /* Only zero general registers. */ > + if (!GENERAL_REGNO_P (regno)) > + continue; > + > + int i = regno; > + if (i >= FIRST_REX_INT_REG) > + i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1); > + > + /* No need to check it again if it is live. */ > + if ((live_outgoing_regs & (1 << i))) > + continue; > + > + /* A register is considered LIVE if > + 1. It is a fixed register. > + 2. If isn't a caller-saved register. > + 3. If it is a live outgoing register. > + 4. It is never used in the function and we don't zero all > + caller-saved registers. > + */ > + if (fixed_regs[regno] > + || !call_used_regs[regno] > + || REGNO_REG_SET_P (live_out, regno) > + || (!zero_all && !df_regs_ever_live_p (regno))) > + live_outgoing_regs |= 1 << i; > + } > +} > + > +/* Split simple return with popping POPC bytes from stack, if POPC > + isn't NULL_RTX, and zero caller-saved general registers if needed. > + When popping POPC bytes from stack for -mfunction-return=, convert > + return to indirect branch with stack adjustment. */ > > void > -ix86_split_simple_return_pop_internal (rtx popc) > +ix86_split_simple_return_internal (rtx popc) > { > - struct machine_function *m = cfun->machine; > - rtx ecx = gen_rtx_REG (SImode, CX_REG); > - rtx_insn *insn; > + /* No need to zero caller-saved registers in main (). Don't zero > + caller-saved registers if __builtin_eh_return is called since it > + isn't a normal function return. */ > + if ((cfun->machine->zero_caller_saved_regs_type > + != zero_caller_saved_regs_skip) > + && !crtl->calls_eh_return > + && cfun->machine->func_type == TYPE_NORMAL > + && !MAIN_NAME_P (DECL_NAME (current_function_decl))) > + { > + unsigned int &live_outgoing_regs > + = cfun->machine->live_outgoing_regs; > > - /* There is no "pascal" calling convention in any 64bit ABI. */ > - gcc_assert (!TARGET_64BIT); > + if (live_outgoing_regs == 0) > + { > + edge e; > + edge_iterator ei; > > - insn = emit_insn (gen_pop (ecx)); > - m->fs.cfa_offset -= UNITS_PER_WORD; > - m->fs.sp_offset -= UNITS_PER_WORD; > + /* ECX register is used for return with pop. */ > + if (popc != NULL_RTX > + && (cfun->machine->function_return_type > + != indirect_branch_keep)) > + live_outgoing_regs = 1 << CX_REG; > > - rtx x = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD); > - x = gen_rtx_SET (stack_pointer_rtx, x); > - add_reg_note (insn, REG_CFA_ADJUST_CFA, x); > - add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (ecx, pc_rtx)); > - RTX_FRAME_RELATED_P (insn) = 1; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + { > + ix86_find_live_outgoing_regs (e->src, > + live_outgoing_regs); > + } > + } > > - x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc); > - x = gen_rtx_SET (stack_pointer_rtx, x); > - insn = emit_insn (x); > - add_reg_note (insn, REG_CFA_ADJUST_CFA, x); > - RTX_FRAME_RELATED_P (insn) = 1; > + rtx zero = NULL_RTX; > + > + unsigned int regno; > + > + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > + { > + if (!GENERAL_REGNO_P (regno)) > + continue; > + > + int i = regno; > + if (i >= FIRST_REX_INT_REG) > + i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1); > + if ((live_outgoing_regs & (1 << i))) > + continue; > + > + /* Zero out dead caller-saved register. We only need to zero > + the lower 32 bits. */ > + rtx reg = gen_rtx_REG (SImode, regno); > + if (zero == NULL_RTX) > + { > + zero = reg; > + rtx tmp = gen_rtx_SET (reg, const0_rtx); > + if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ()) > + { > + rtx clob = gen_rtx_CLOBBER (VOIDmode, > + gen_rtx_REG (CCmode, > + FLAGS_REG)); > + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, > + tmp, > + clob)); > + } > + emit_insn (tmp); > + } > + else > + emit_move_insn (reg, zero); > + > + /* Mark it in use */ > + emit_insn (gen_pro_epilogue_use (reg)); > + } > + } > + > + if (popc) > + { > + if (cfun->machine->function_return_type != indirect_branch_keep) > + { > + struct machine_function *m = cfun->machine; > + rtx ecx = gen_rtx_REG (SImode, CX_REG); > + rtx_insn *insn; > + > + /* There is no "pascal" calling convention in any 64bit ABI. */ > + gcc_assert (!TARGET_64BIT); > + > + insn = emit_insn (gen_pop (ecx)); > + m->fs.cfa_offset -= UNITS_PER_WORD; > + m->fs.sp_offset -= UNITS_PER_WORD; > + > + rtx x = plus_constant (Pmode, stack_pointer_rtx, > + UNITS_PER_WORD); > + x = gen_rtx_SET (stack_pointer_rtx, x); > + add_reg_note (insn, REG_CFA_ADJUST_CFA, x); > + add_reg_note (insn, REG_CFA_REGISTER, > + gen_rtx_SET (ecx, pc_rtx)); > + RTX_FRAME_RELATED_P (insn) = 1; > > - /* Now return address is in ECX. */ > - emit_jump_insn (gen_simple_return_indirect_internal (ecx)); > + x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc); > + x = gen_rtx_SET (stack_pointer_rtx, x); > + insn = emit_insn (x); > + add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (x)); > + RTX_FRAME_RELATED_P (insn) = 1; > + > + /* Mark ECX in use */ > + emit_insn (gen_pro_epilogue_use (ecx)); > + > + /* Now return address is in ECX. */ > + emit_jump_insn (gen_simple_return_indirect_internal (ecx)); > + } > + else > + emit_jump_insn (gen_simple_return_pop_internal_1 (popc)); > + } > + else > + emit_jump_insn (gen_simple_return_internal_1 ()); > } > > /* Output the assembly for a call instruction. */ > @@ -40798,6 +40966,27 @@ ix86_handle_fndecl_attribute (tree *node, tree name, > tree args, int, > } > } > > + if (is_attribute_p ("zero_caller_saved_regs", name)) > + { > + tree cst = TREE_VALUE (args); > + if (TREE_CODE (cst) != STRING_CST) > + { > + warning (OPT_Wattributes, > + "%qE attribute requires a string constant argument", > + name); > + *no_add_attrs = true; > + } > + else if (strcmp (TREE_STRING_POINTER (cst), "skip") != 0 > + && strcmp (TREE_STRING_POINTER (cst), "used") != 0 > + && strcmp (TREE_STRING_POINTER (cst), "all") != 0) > + { > + warning (OPT_Wattributes, > + "argument to %qE attribute is not (skip|used|all)", > + name); > + *no_add_attrs = true; > + } > + } > + > return NULL_TREE; > } > > @@ -45099,6 +45288,8 @@ static const struct attribute_spec > ix86_attribute_table[] = > ix86_handle_fndecl_attribute, NULL }, > { "indirect_return", 0, 0, false, true, true, false, > NULL, NULL }, > + { "zero_caller_saved_regs", 1, 1, true, false, false, false, > + ix86_handle_fndecl_attribute, NULL }, > > /* End element. */ > { NULL, 0, 0, false, false, false, false, NULL, NULL } > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index 6445ee5d50a..60deec0a496 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -2715,6 +2715,10 @@ struct GTY(()) machine_function { > the "interrupt" or "no_caller_saved_registers" attribute. */ > BOOL_BITFIELD no_caller_saved_registers : 1; > > + /* How to clear caller-saved general registers upon function > + return. */ > + ENUM_BITFIELD(zero_caller_saved_regs) zero_caller_saved_regs_type : 3; > + > /* If true, there is register available for argument passing. This > is used only in ix86_function_ok_for_sibcall by 32-bit to determine > if there is scratch register available for indirect sibcall. In > @@ -2742,6 +2746,9 @@ struct GTY(()) machine_function { > /* If true, ENDBR is queued at function entrance. */ > BOOL_BITFIELD endbr_queued_at_entrance : 1; > > + /* Registers live at exit. */ > + unsigned int live_outgoing_regs; > + > /* The largest alignment, in bytes, of stack slot actually used. */ > unsigned int max_used_stack_alignment; > > @@ -2841,6 +2848,12 @@ extern void debug_dispatch_window (int); > (ix86_indirect_branch_register \ > || cfun->machine->indirect_branch_type != indirect_branch_keep) > > +#define TARGET_POP_SCRATCH_REGISTER \ > + (TARGET_64BIT \ > + || (cfun->machine->zero_caller_saved_regs_type \ > + == zero_caller_saved_regs_skip) \ > + || cfun->machine->function_return_type == indirect_branch_keep) > + > #define IX86_HLE_ACQUIRE (1 << 16) > #define IX86_HLE_RELEASE (1 << 17) > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 86f2c032e1b..cf8faacb7e3 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -183,6 +183,8 @@ > UNSPEC_PDEP > UNSPEC_PEXT > > + UNSPEC_SIMPLE_RETURN > + > ;; IRET support > UNSPEC_INTERRUPT_RETURN > ]) > @@ -193,7 +195,7 @@ > UNSPECV_STACK_PROBE > UNSPECV_PROBE_STACK_RANGE > UNSPECV_ALIGN > - UNSPECV_PROLOGUE_USE > + UNSPECV_PRO_EPILOGUE_USE > UNSPECV_SPLIT_STACK_RETURN > UNSPECV_CLD > UNSPECV_NOPS > @@ -12997,8 +12999,8 @@ > > ;; As USE insns aren't meaningful after reload, this is used instead > ;; to prevent deleting instructions setting registers for PIC code > -(define_insn "prologue_use" > - [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)] > +(define_insn "pro_epilogue_use" > + [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)] > "" > "" > [(set_attr "length" "0")]) > @@ -13039,10 +13041,23 @@ > } > }) > > -(define_insn "simple_return_internal" > +(define_insn_and_split "simple_return_internal" > [(simple_return)] > "reload_completed" > "* return ix86_output_function_return (false);" > + "&& epilogue_completed" > + [(const_int 0)] > + "ix86_split_simple_return_internal (NULL_RTX); DONE;" > + [(set_attr "length" "1") > + (set_attr "atom_unit" "jeu") > + (set_attr "length_immediate" "0") > + (set_attr "modrm" "0")]) > + > +(define_insn "simple_return_internal_1" > + [(simple_return) > + (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)] > + "reload_completed" > + "* return ix86_output_function_return (false);" > [(set_attr "length" "1") > (set_attr "atom_unit" "jeu") > (set_attr "length_immediate" "0") > @@ -13075,9 +13090,21 @@ > (use (match_operand:SI 0 "const_int_operand"))] > "reload_completed" > "%!ret\t%0" > - "&& cfun->machine->function_return_type != indirect_branch_keep" > + "&& (epilogue_completed > + || cfun->machine->function_return_type != indirect_branch_keep)" > [(const_int 0)] > - "ix86_split_simple_return_pop_internal (operands[0]); DONE;" > + "ix86_split_simple_return_internal (operands[0]); DONE;" > + [(set_attr "length" "3") > + (set_attr "atom_unit" "jeu") > + (set_attr "length_immediate" "2") > + (set_attr "modrm" "0")]) > + > +(define_insn "simple_return_pop_internal_1" > + [(simple_return) > + (use (match_operand:SI 0 "const_int_operand")) > + (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)] > + "reload_completed" > + "%!ret\t%0" > [(set_attr "length" "3") > (set_attr "atom_unit" "jeu") > (set_attr "length_immediate" "2") > @@ -18900,6 +18927,11 @@ > (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))]) > > ;; Convert epilogue deallocator to pop. > +;; Don't do it when > +;; -mfunction-return= -mzero-caller-saved-regs= > +;; is used in 32-bit snce return with stack pop needs to increment > +;; stack register and scratch registers must be zeroed. Pop scratch > +;; register will load value from stack. > (define_peephole2 > [(match_scratch:W 1 "r") > (parallel [(set (reg:P SP_REG) > @@ -18908,6 +18940,7 @@ > (clobber (reg:CC FLAGS_REG)) > (clobber (mem:BLK (scratch)))])] > "(TARGET_SINGLE_POP || optimize_insn_for_size_p ()) > + && TARGET_POP_SCRATCH_REGISTER > && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)" > [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG)))) > (clobber (mem:BLK (scratch)))])]) > @@ -18923,6 +18956,7 @@ > (clobber (reg:CC FLAGS_REG)) > (clobber (mem:BLK (scratch)))])] > "(TARGET_DOUBLE_POP || optimize_insn_for_size_p ()) > + && TARGET_POP_SCRATCH_REGISTER > && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)" > [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG)))) > (clobber (mem:BLK (scratch)))]) > @@ -18936,6 +18970,7 @@ > (clobber (reg:CC FLAGS_REG)) > (clobber (mem:BLK (scratch)))])] > "optimize_insn_for_size_p () > + && TARGET_POP_SCRATCH_REGISTER > && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)" > [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG)))) > (clobber (mem:BLK (scratch)))]) > @@ -18948,7 +18983,8 @@ > (plus:P (reg:P SP_REG) > (match_operand:P 0 "const_int_operand"))) > (clobber (reg:CC FLAGS_REG))])] > - "INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)" > + "TARGET_POP_SCRATCH_REGISTER > + && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)" > [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))]) > > ;; Two pops case is tricky, since pop causes dependency > @@ -18960,7 +18996,8 @@ > (plus:P (reg:P SP_REG) > (match_operand:P 0 "const_int_operand"))) > (clobber (reg:CC FLAGS_REG))])] > - "INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)" > + "TARGET_POP_SCRATCH_REGISTER > + && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)" > [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG)))) > (set (match_dup 2) (mem:W (post_inc:P (reg:P SP_REG))))]) > > @@ -18971,6 +19008,7 @@ > (match_operand:P 0 "const_int_operand"))) > (clobber (reg:CC FLAGS_REG))])] > "optimize_insn_for_size_p () > + && TARGET_POP_SCRATCH_REGISTER > && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)" > [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG)))) > (set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))]) > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index e7fbf9b6f99..da9b442ecbf 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -1063,3 +1063,20 @@ Support WAITPKG built-in functions and code generation. > mcldemote > Target Report Mask(ISA_CLDEMOTE) Var(ix86_isa_flags2) Save > Support CLDEMOTE built-in functions and code generation. > + > +mzero-caller-saved-regs= > +Target Report RejectNegative Joined Enum(zero_caller_saved_regs) > Var(ix86_zero_caller_saved_regs) Init(zero_caller_saved_regs_skip) > +Clear caller-saved general registers upon function return. > + > +Enum > +Name(zero_caller_saved_regs) Type(enum zero_caller_saved_regs) > +Known choices of clearing caller-saved general registers upon function > return (for use with the -mzero-caller-saved-regs= option): > + > +EnumValue > +Enum(zero_caller_saved_regs) String(skip) Value(zero_caller_saved_regs_skip) > + > +EnumValue > +Enum(zero_caller_saved_regs) String(used) Value(zero_caller_saved_regs_used) > + > +EnumValue > +Enum(zero_caller_saved_regs) String(all) Value(zero_caller_saved_regs_all) > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index cfe6a8e5bb8..023f6155e58 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -5931,6 +5931,14 @@ The @code{indirect_return} attribute can be applied to > a function, > as well as variable or type of function pointer to inform the > compiler that the function may return via indirect branch. > > +@item zero_caller_saved_regs("@var{choice}") > +@cindex @code{zero_caller_saved_regs} function attribute, x86 > +On x86 targets, the @code{zero_caller_saved_regs} attribute causes the > +compiler to zero caller-saved integer registers at function return with > +@var{choice}. @samp{skip} doesn't zero caller-saved integer registers. > +@samp{used} zeros caller-saved integer registers which are used in > +function. @samp{all} zeros all caller-saved integer registers. > + > @end table > > On the x86, the inliner does not inline a > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 7ef4e7a449b..796477452d5 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -1307,7 +1307,7 @@ See RS/6000 and PowerPC Options. > -mstack-protector-guard-symbol=@var{symbol} @gol > -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol > -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol > --mindirect-branch-register} > +-mindirect-branch-register -mzero-caller-saved-regs=@var{choice}} > > @emph{x86 Windows Options} > @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol > @@ -28459,6 +28459,16 @@ not be reachable in the large code model. > @opindex -mindirect-branch-register > Force indirect call and jump via register. > > +@item -mzero-caller-saved-regs=@var{choice} > +@opindex -mzero-caller-saved-regs > +Zero caller-saved integer registers at function return with @var{choice}. > +The default is @samp{skip}, which doesn't zero caller-saved integer > +registers. @samp{used} zeros caller-saved integer registers which are > +used in function. @samp{all} zeros all caller-saved integer registers. > +You can control this behavior for a specific function by using the > +function attribute @code{zero_caller_saved_regs}. > +@xref{Function Attributes}. > + > @end table > > These @samp{-m} switches are supported in addition to the above > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c > new file mode 100644 > index 00000000000..08533500eff > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */ > + > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */ > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c > new file mode 100644 > index 00000000000..961bb720cb2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */ > + > +extern int foo (int) __attribute__ ((zero_caller_saved_regs("all"))); > + > +int > +foo (int x) > +{ > + return x; > +} > + > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } > } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c > new file mode 100644 > index 00000000000..677c5b3d9fd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c > @@ -0,0 +1,39 @@ > +/* { dg-do run { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */ > + > +struct S { int i; }; > +__attribute__((const, noinline, noclone)) > +struct S foo (int x) > +{ > + struct S s; > + s.i = x; > + return s; > +} > + > +int a[2048], b[2048], c[2048], d[2048]; > +struct S e[2048]; > + > +__attribute__((noinline, noclone)) void > +bar (void) > +{ > + int i; > + for (i = 0; i < 1024; i++) > + { > + e[i] = foo (i); > + a[i+2] = a[i] + a[i+1]; > + b[10] = b[10] + i; > + c[i] = c[2047 - i]; > + d[i] = d[i + 1]; > + } > +} > + > +int > +main () > +{ > + int i; > + bar (); > + for (i = 0; i < 1024; i++) > + if (e[i].i != i) > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c > new file mode 100644 > index 00000000000..26e48d56179 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c > @@ -0,0 +1,39 @@ > +/* { dg-do run { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */ > + > +struct S { int i; }; > +__attribute__((const, noinline, noclone)) > +struct S foo (int x) > +{ > + struct S s; > + s.i = x; > + return s; > +} > + > +int a[2048], b[2048], c[2048], d[2048]; > +struct S e[2048]; > + > +__attribute__((noinline, noclone)) void > +bar (void) > +{ > + int i; > + for (i = 0; i < 1024; i++) > + { > + e[i] = foo (i); > + a[i+2] = a[i] + a[i+1]; > + b[10] = b[10] + i; > + c[i] = c[2047 - i]; > + d[i] = d[i + 1]; > + } > +} > + > +int > +main () > +{ > + int i; > + bar (); > + for (i = 0; i < 1024; i++) > + if (e[i].i != i) > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c > new file mode 100644 > index 00000000000..cc402ad605c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */ > + > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } > } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c > new file mode 100644 > index 00000000000..ed75361d545 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */ > + > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */ > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c > new file mode 100644 > index 00000000000..83e2c4efcf2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */ > + > +extern void foo (void) __attribute__ ((zero_caller_saved_regs("used"))); > + > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */ > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c > new file mode 100644 > index 00000000000..ef902d5311a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */ > + > +__attribute__ ((zero_caller_saved_regs("all"))) > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } > } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c > new file mode 100644 > index 00000000000..91e54b5403e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */ > + > +extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip"))); > + > +void > +foo (void) > +{ > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */ > +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c > new file mode 100644 > index 00000000000..5e21de9bca5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */ > + > +int > +foo (int x) > +{ > + return x; > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */ > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } > } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c > new file mode 100644 > index 00000000000..27fd9e48640 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */ > + > +int > +foo (int x) > +{ > + return x; > +} > + > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } > } } } */ > +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } > } } } */ > diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c > b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c > new file mode 100644 > index 00000000000..dee849d9e5e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target *-*-linux* } } */ > +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */ > + > +extern int foo (int) __attribute__ ((zero_caller_saved_regs("used"))); > + > +int > +foo (int x) > +{ > + return x; > +} > + > +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */ > +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } > } } } */ > -- > 2.17.1 >