Steve Ellcey <sell...@marvell.com> writes: > On Thu, 2018-12-06 at 12:25 +0000, Richard Sandiford wrote: >> >> Since we're looking at the call insns anyway, we could have a hook that >> "jousts" two calls and picks the one that preserves *fewer* registers. >> This would mean that loop produces a single instruction that conservatively >> describes the call-preserved registers. We could then stash that >> instruction in lra_reg instead of the current check_part_clobbered >> boolean. >> >> The hook should by default be a null pointer, so that we can avoid >> the instruction walk on targets that don't need it. >> >> That would mean that LRA would always have a call instruction to hand >> when asking about call-preserved information. So I think we should >> add an insn parameter to targetm.hard_regno_call_part_clobbered, >> with a null insn selecting the defaul behaviour. I know it's >> going to be a pain to update all callers and targets, sorry. > > Richard, here is an updated version of this patch. It is not > completly tested yet but I wanted to send this out and make > sure it is what you had in mind and see if you had any comments about > the new target function while I am testing it (including building > some of the other targets).
Yeah, this was the kind of thing I had in mind, thanks. > /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves > the lower 64 bits of a 128-bit register. Tell the compiler the callee > clobbers the top 64 bits when restoring the bottom 64 bits. */ > > static bool > -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode > mode) > +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, > + unsigned int regno, > + machine_mode mode) > { > + if (insn && CALL_P (insn) && aarch64_simd_call_p (insn)) > + return false; > return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8); This should be choosing between 8 and 16 for the maybe_gt, since even SIMD functions clobber bits 128 and above for SVE. > +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS. */ > + > +rtx_insn * > +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2) > +{ > + gcc_assert (CALL_P (call_1)); > + if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2)) > + return call_1; > + else > + return call_2; Nit: redundant parens in "(call_2 == NULL_RTX)". > diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c > index 023308b..2cf993d 100644 > --- a/gcc/config/avr/avr.c > +++ b/gcc/config/avr/avr.c > @@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno, > machine_mode mode) > /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. */ > > static bool > -avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode) > +avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED, > + unsigned regno, > + machine_mode mode) > { Also very minor, sorry, but: I think it's usual to put the parameters on the same line when they fit. Same for the other hooks. > @@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to > nearby locations. > > The hook returns true if it succeeds, storing the offset of the > anchor from the base in @var{offset1} and the offset of the final address > -from the anchor in @var{offset2}. The default implementation returns false. > +from the anchor in @var{offset2}. ehe defnult implementation returns false. > @end deftypefn Stray change here. > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c > index 7ffcd35..31a567a 100644 > --- a/gcc/lra-constraints.c > +++ b/gcc/lra-constraints.c > @@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno, > static inline bool > need_for_call_save_p (int regno) > { > + machine_mode pmode = PSEUDO_REGNO_MODE (regno); > + int new_regno = reg_renumber[regno]; > + > lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0); > - return (usage_insns[regno].calls_num < calls_num > - && (overlaps_hard_reg_set_p > - ((flag_ipa_ra && > - ! hard_reg_set_empty_p > (lra_reg_info[regno].actual_call_used_reg_set)) > - ? lra_reg_info[regno].actual_call_used_reg_set > - : call_used_reg_set, > - PSEUDO_REGNO_MODE (regno), reg_renumber[regno]) > - || (targetm.hard_regno_call_part_clobbered > - (reg_renumber[regno], PSEUDO_REGNO_MODE (regno))))); > + > + if (usage_insns[regno].calls_num >= calls_num) > + return false; > + > + if (flag_ipa_ra > + && !hard_reg_set_empty_p > (lra_reg_info[regno].actual_call_used_reg_set)) > + return (overlaps_hard_reg_set_p > + (lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno) > + || targetm.hard_regno_call_part_clobbered > + (lra_reg_info[regno].call_insn, new_regno, pmode)); > + else > + return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno) > + || targetm.hard_regno_call_part_clobbered > + (lra_reg_info[regno].call_insn, new_regno, pmode)); > } > I think it'd be safer just to add the new parameter to the existing code, rather than rework it like this. I'm not sure off-hand why the existing code tests targetm.hard_regno_call_part_clobbered when we have actual_call_used_reg_set. Seems like it shouldn't be necessary if we know exactly which registers the function clobbers. Changing that should probably be a separate follow-on patch though. > /* Global registers occurring in the current EBB. */ > @@ -635,6 +637,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > rtx link, *link_loc; > bool need_curr_point_incr; > HARD_REG_SET last_call_used_reg_set; > + rtx_insn *call_insn; > > reg_live_out = df_get_live_out (bb); > sparseset_clear (pseudos_live); > @@ -658,6 +661,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > if (lra_dump_file != NULL) > fprintf (lra_dump_file, " BB %d\n", bb->index); > > + call_insn = NULL; > + if (targetm.return_call_with_max_clobbers) > + { > + FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next) > + { > + if (CALL_P (curr_insn)) > + call_insn = targetm.return_call_with_max_clobbers (curr_insn, > + call_insn); > + } > + } > + > /* Scan the code of this basic block, noting which pseudos and hard > regs are born or die. > > @@ -847,7 +861,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > update_pseudo_point (reg->regno, curr_point, USE_POINT); > mark_regno_live (reg->regno, reg->biggest_mode); > check_pseudos_live_through_calls (reg->regno, > - last_call_used_reg_set); > + last_call_used_reg_set, > + call_insn); > } > > if (!HARD_REGISTER_NUM_P (reg->regno)) > @@ -912,9 +927,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > { > IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set, > this_call_used_reg_set); > + > + lra_reg_info[j].call_insn = curr_insn; > + > if (flush) > - check_pseudos_live_through_calls > - (j, last_call_used_reg_set); > + check_pseudos_live_through_calls (j, > + last_call_used_reg_set, > + call_insn); > } > COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set); > } > @@ -956,7 +975,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > update_pseudo_point (reg->regno, curr_point, USE_POINT); > mark_regno_live (reg->regno, reg->biggest_mode); > check_pseudos_live_through_calls (reg->regno, > - last_call_used_reg_set); > + last_call_used_reg_set, > + call_insn); > } > > for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next) > @@ -1125,7 +1145,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool > dead_insn_p) > if (sparseset_cardinality (pseudos_live_through_calls) == 0) > break; > if (sparseset_bit_p (pseudos_live_through_calls, j)) > - check_pseudos_live_through_calls (j, last_call_used_reg_set); > + check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn); > } > > for (i = 0; HARD_REGISTER_NUM_P (i); ++i) I think we can do this more accurately by instead keeping track of the current call during the main block walk and extending this: if (! flag_ipa_ra) COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set); else { so that we use the "else" when targetm.return_call_with_max_clobbers is nonnull. Then we should extend this: bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set) && ! hard_reg_set_equal_p (last_call_used_reg_set, this_call_used_reg_set)) so that we flush when the old call insn preserves more registers than the new one. Also, I think: lra_reg_info[j].call_insn = curr_insn; should happen in check_pseudos_live_through_calls and should apply targetm.return_call_with_max_clobbers to the register's existing call_insn (if any), rather than simply overwrite it. That way the register info will track the "worst" call for all regions in which the register is live. > diff --git a/gcc/regrename.c b/gcc/regrename.c > index a180ced..109add0 100644 > --- a/gcc/regrename.c > +++ b/gcc/regrename.c > @@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg, > && ! DEBUG_INSN_P (tmp->insn)) > || (this_head->need_caller_save_reg > && ! (targetm.hard_regno_call_part_clobbered > - (reg, GET_MODE (*tmp->loc))) > + (tmp->insn, reg, GET_MODE (*tmp->loc))) > && (targetm.hard_regno_call_part_clobbered > - (new_reg, GET_MODE (*tmp->loc))))) > + (tmp->insn, new_reg, GET_MODE (*tmp->loc))))) > return false; > > return true; tmp->insn isn't the call we care about here. I think we should just pass null. > diff --git a/gcc/reload1.c b/gcc/reload1.c > index b703402..5490ae5 100644 > --- a/gcc/reload1.c > +++ b/gcc/reload1.c > @@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain) > : out_regno + k); > reg_reloaded_insn[regno + k] = insn; > SET_HARD_REG_BIT (reg_reloaded_valid, regno + k); > - if (targetm.hard_regno_call_part_clobbered (regno + k, > + if (targetm.hard_regno_call_part_clobbered (insn, > + regno + k, > mode)) > SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, > regno + k); > @@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain) > : in_regno + k); > reg_reloaded_insn[regno + k] = insn; > SET_HARD_REG_BIT (reg_reloaded_valid, regno + k); > - if (targetm.hard_regno_call_part_clobbered (regno + k, > + if (targetm.hard_regno_call_part_clobbered (insn, > + regno + k, > mode)) > SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, > regno + k); > @@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain) > CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k); > SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k); > if (targetm.hard_regno_call_part_clobbered > - (src_regno + k, mode)) > + (insn, src_regno + k, mode)) > SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, > src_regno + k); > else The insns in this case might not be call instructions. I think for the legacy reload.c, reload1.c and caller-save.c we can just pass NULL rather than try to be optimal. > diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c > index e15cf08..53c2e26 100644 > --- a/gcc/sched-deps.c > +++ b/gcc/sched-deps.c > @@ -3728,7 +3728,8 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn > *insn) > Since we only have a choice between 'might be clobbered' > and 'definitely not clobbered', we must include all > partly call-clobbered registers here. */ > - else if (targetm.hard_regno_call_part_clobbered (i, > + else if (targetm.hard_regno_call_part_clobbered (insn, > + i, > reg_raw_mode[i]) > || TEST_HARD_REG_BIT (regs_invalidated_by_call, i)) > SET_REGNO_REG_SET (reg_pending_clobbers, i); No need to split the line after "insn". > diff --git a/gcc/target.def b/gcc/target.def > index e8f0f70..ecb0ea7 100644 > --- a/gcc/target.def > +++ b/gcc/target.def > @@ -5772,8 +5772,21 @@ return true for a 64-bit mode but false for a 32-bit > mode.\n\ > \n\ > The default implementation returns false, which is correct\n\ > for targets that don't have partly call-clobbered registers.", > - bool, (unsigned int regno, machine_mode mode), > - hook_bool_uint_mode_false) > + bool, (rtx_insn *, unsigned int regno, machine_mode mode), > + hook_bool_insn_uint_mode_false) Should name the insn parameter and describe it in the docs. (Realise this is just a first cut.) > +DEFHOOK > +(return_call_with_max_clobbers, > + "This hook returns a pointer to the call that partially clobbers the\n\ > +most registers. If a platform supports multiple ABIs where the registers\n\ > +that are partially clobbered may vary, this function compares two\n\ > +two calls and return a pointer to the one that clobbers the most > registers.\n\ s/two two calls and return/two calls and returns/ Thanks, Richard