Qing Zhao <qing.z...@oracle.com> writes: >> On Sep 11, 2020, at 4:44 PM, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >> >> Qing Zhao <qing.z...@oracle.com> writes: >>>> On Sep 11, 2020, at 12:32 PM, Richard Sandiford >>>> <richard.sandif...@arm.com> >> If we go for (2), then I think it would be >>>> better to do >>>> it at the start of pass_late_compilation instead. (Some targets wouldn't >>>> cope with doing it later.) The reason for doing it so late is that the >>>> set of used “volatile”/caller-saved registers is not fixed at prologue >>>> and epilogue generation: later optimisation passes can introduce uses >>>> of volatile registers that weren't used previously. (Sorry if this >>>> has already been suggested.) >>> >>> Yes, I agree. >>> >>> I thought that it might be better to move this task at the very late of the >>> RTL stage, i.e, before “final” phase. >>> >>> Another solution is (discussed with Hongjiu): >>> >>> 1. Define a new target hook: >>> >>> targetm.return_with_zeroing(bool simple_return_p, HARD_REG_SET >>> need_zeroed_hardregs, bool gpr_only) >>> >>> 2. Add the following routine in middle end: >>> >>> rtx_insn * >>> generate_return_rtx (bool simple_return_p) >>> { >>> if (targetm.return_with_zeroing) >>> { >>> Compute the hardregs set for clearing into “need_zeroed_hardregs”; >>> return targetm.return_with_zeroing (simple_return_p, >>> need_zeroed_hardregs, gpr_only); >>> } >>> else >>> { >>> if (simple_return_p) >>> return targetm.gen_simple_return ( ); >>> else >>> return targetm.gen_return (); >>> } >>> } >>> >>> Then replace all call to “targetm.gen_simple_return” and >>> “targetm.gen_return” to “generate_return_rtx()”. >>> >>> 3. In the target, >>> Implement “return_with_zeroing”. >>> >>> >>> Let me know your comments on this. >> >> I think having a separate pass is better. We don't normally know >> at the point of generating the return which registers will need >> to be cleared. > > At the point of generating the return, we can compute the > “need_zeroed_hardregs” HARD_REG_SET > by using data flow information, the function abi of the routine, and also the > user option and source code > attribute information together. These information should be available at each > point when generating the return.
Like I mentioned earlier though, passes that run after pass_thread_prologue_and_epilogue can use call-clobbered registers that weren't previously used. For example, on x86_64, the function might not use %r8 when the prologue, epilogue and returns are generated, but pass_regrename might later introduce a new use of %r8. AIUI, the “used” version of the new command-line option is supposed to clear %r8 in these circumstances, but it wouldn't do so if the data was collected at the point that the return is generated. That's why I think it's more robust to do this later (at the beginning of pass_late_compilation) and insert the zeroing before returns that already exist. >> So IMO the pass should just search for all the >> returns in a function and insert the zeroing instructions before >> each one. > > I was considering this approach too for some time, however, there is one > major issue with this as > Segher mentioned, The middle end does not know some details on the registers, > lacking such > detailed information might result incorrect code generation at middle end. > > For example, on x86_64 target, when “return” with pop, the scratch register > “ECX” will be > used for returning, then it’s incorrect to zero “ecx” before generating the > return. Since middle end > doesn't have such information, it cannot avoid to zero “ecx”. Therefore > incorrect code might be > generated. > > Segher also mentioned that on Power, there are some scratch registers also > are used for > Other purpose, clearing them before return is not correct. But the dataflow information has to be correct between pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise any pass in that region could clobber the registers in the same way. To get the registers that are live before the return, you can start with the registers that are live out from the block that contains the return, then “simulate” the return instruction backwards to get the set of registers that are live before the return instruction (see df_simulate_one_insn_backwards). In the x86_64 case you mention, the pattern is: (define_insn "*simple_return_indirect_internal<mode>" [(simple_return) (use (match_operand:W 0 "register_operand" "r"))] "reload_completed" …) This (use …) tells the df machinery that the instruction needs operand 0 (= ecx). The process above would therefore realise that ecx can't be clobbered. Thanks, Richard