Hi, Richard, > On Sep 11, 2020, at 5:56 PM, Richard Sandiford <richard.sandif...@arm.com> > wrote: > > Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes: >>> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandif...@arm.com> >>> wrote: >>> >>> Qing Zhao <qing.z...@oracle.com> writes: >>>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford >>>>> <richard.sandif...@arm.com> >> If we go for (2), then I think it would >>>>> be better to do >>>>> it at the start of pass_late_compilation instead. (Some targets wouldn't >>>>> cope with doing it later.) The reason for doing it so late is that the >>>>> set of used “volatile”/caller-saved registers is not fixed at prologue >>>>> and epilogue generation: later optimisation passes can introduce uses >>>>> of volatile registers that weren't used previously. (Sorry if this >>>>> has already been suggested.) >>>> >>>> Yes, I agree. >>>> >>>> I thought that it might be better to move this task at the very late of >>>> the RTL stage, i.e, before “final” phase. >>>> >>>> Another solution is (discussed with Hongjiu): >>>> >>>> 1. Define a new target hook: >>>> >>>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET >>>> need_zeroed_hardregs, bool gpr_only) >>>> >>>> 2. Add the following routine in middle end: >>>> >>>> rtx_insn * >>>> generate_return_rtx (bool simple_return_p) >>>> { >>>> if (targetm.return_with_zeroing) >>>> { >>>> Compute the hardregs set for clearing into “need_zeroed_hardregs”; >>>> return targetm.return_with_zeroing (simple_return_p, >>>> need_zeroed_hardregs, gpr_only); >>>> } >>>> else >>>> { >>>> if (simple_return_p) >>>> return targetm.gen_simple_return ( ); >>>> else >>>> return targetm.gen_return (); >>>> } >>>> } >>>> >>>> Then replace all call to “targetm.gen_simple_return” and >>>> “targetm.gen_return” to “generate_return_rtx()”. >>>> >>>> 3. In the target, >>>> Implement “return_with_zeroing”. >>>> >>>> >>>> Let me know your comments on this. >>> >>> I think having a separate pass is better. We don't normally know >>> at the point of generating the return which registers will need >>> to be cleared. >> >> At the point of generating the return, we can compute the >> “need_zeroed_hardregs” HARD_REG_SET >> by using data flow information, the function abi of the routine, and also >> the user option and source code >> attribute information together. These information should be available at >> each point when generating the return. > > Like I mentioned earlier though, passes that run after > pass_thread_prologue_and_epilogue can use call-clobbered registers that > weren't previously used. For example, on x86_64, the function might > not use %r8 when the prologue, epilogue and returns are generated, > but pass_regrename might later introduce a new use of %r8. AIUI, > the “used” version of the new command-line option is supposed to clear > %r8 in these circumstances, but it wouldn't do so if the data was > collected at the point that the return is generated.
Thanks for the information. > > That's why I think it's more robust to do this later (at the beginning > of pass_late_compilation) and insert the zeroing before returns that > already exist. Yes, looks like it’s not correct to insert the zeroing at the time when prologue, epilogue and return are generated. As I also checked, “return” might be also generated as late as pass “pass_delay_slots”, So, shall we move the New pass as late as possible? Can I put it immediately before “pass_final”? What’s the latest place I can put it? > >>> So IMO the pass should just search for all the >>> returns in a function and insert the zeroing instructions before >>> each one. >> >> I was considering this approach too for some time, however, there is one >> major issue with this as >> Segher mentioned, The middle end does not know some details on the >> registers, lacking such >> detailed information might result incorrect code generation at middle end. >> >> For example, on x86_64 target, when “return” with pop, the scratch register >> “ECX” will be >> used for returning, then it’s incorrect to zero “ecx” before generating the >> return. Since middle end >> doesn't have such information, it cannot avoid to zero “ecx”. Therefore >> incorrect code might be >> generated. >> >> Segher also mentioned that on Power, there are some scratch registers also >> are used for >> Other purpose, clearing them before return is not correct. > > But the dataflow information has to be correct between > pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise > any pass in that region could clobber the registers in the same way. You mean, the data flow information will be not correct after pass_free_cfg? “pass_delay_slots” is after “pass_free_cfg”, and there might be new “return” generated in “pass_delay_slots”, If we want to generate zeroing for the new “return” which was generated in “pass_delay_slots”, can we correctly to do so? > > To get the registers that are live before the return, you can start with > the registers that are live out from the block that contains the return, > then “simulate” the return instruction backwards to get the set of > registers that are live before the return instruction > (see df_simulate_one_insn_backwards). Okay. Currently, I am using the following to check whether a reg is live out the block that contains the return: /* Check whether the hard register REGNO is live at the exit block * of the current routine. */ static bool is_live_reg_at_exit (unsigned int regno) { edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) { bitmap live_out = df_get_live_out (e->src); if (REGNO_REG_SET_P (live_out, regno)) return true; } return false; } Is this correct? > > In the x86_64 case you mention, the pattern is: > > (define_insn "*simple_return_indirect_internal<mode>" > [(simple_return) > (use (match_operand:W 0 "register_operand" "r"))] > "reload_completed" > …) > > This (use …) tells the df machinery that the instruction needs > operand 0 (= ecx). The process above would therefore realise > that ecx can't be clobbered. Okay, I see. The df will reflect this information, no need for special handling here. However, for the cases on Power as Segher mentioned, there are also some scratch registers used for Other purpose, not sure whether we can correctly generate zeroing in middle-end for Power? Thanks Qing > > Thanks, > Richard