Hi Jeff, Thanks your comments. I have few questions that I don't quite understand.
> One of the things that needs to be upstreamed is long jump support within > a function. Essentially once a function reaches 1M in size we have the > real possibility that a direct jump may not reach its target. > > To support this I expect that $ra is going to become a fixed register (ie, > not available to the register allocator as a temporary). It'll be used > as a scratch register for long jump sequences. > > One of the consequences of this is $ra will need to be saved in leaf > functions that are near or over 1M in size. > > Note that at the time when we have to lay out the stack, we do not know > the precise length of the function. So there's a degree of "fuzz" in the > decision whether or not to save $ra in a function that is close to the 1M > limit. Do you mean that, long jump to more than 1M offset will need multiple jal and each jal will save the $ra ? If yes, I'm confused about what's the influence of the $ra saving for function prologue. We will save the fp+ra at the prologue, the next $ra saving seems will not modify the $ra already saved. > I don't think you can reliably know if $ra is valid in an arbitrary leaf > function or not. You could implement some heuristics by looking at the > symbol table (which I'm guessing you don't want to do) or by > disassembling the prologue (again, I'm guessing you don't want to do that > either). I think it's yes (not valid) when we want to get the return address to parent function from $ra directly in the function body. But we can get the right return address from fp with offset if we save them at prologue, is it right ? > Meaning that what you really want is to be using -fno-omit-frame-pointer > and for $ra to always be saved in the stack, even in a leaf function. This is also another solution but will change the default behavior of -fno-omit-frame-pointer. > Presumably you're not suggesting any of these options be used in general > -- they're going to be used for things like embedded devices or firmware? > Also note there are low overhead unwinding schemes out there that are > already supported in various tools -- ORC & SFRAME come > immediately to mind. Those may be better than building a bespoke > solution for the embedded space. Yes. You're right, I forget to introduce background of the requirement. It will be used in the firmware where the dwarf or unwinding maybe not acceptable. Yanzhang > -----Original Message----- > From: Jeff Law <jeffreya...@gmail.com> > Sent: Wednesday, June 7, 2023 10:13 AM > To: Wang, Yanzhang <yanzhang.w...@intel.com>; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2 > <pan2...@intel.com> > Subject: Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support. > > > > On 6/4/23 20:49, Wang, Yanzhang wrote: > > Hi Jeff, > > > > Yes, there's a requirement to support backtrace based on the fp+ra. > > And the unwind/cfa is not acceptable because it will add additional > > sections to the binary. Currently, -fno-omit-frame-pointer can not > > save the ra for the leaf function. So we need to add another option > > like ARM/X86 to support consistent fp+ra stack layout for the leaf and > > non-leaf functions. > One of the things that needs to be upstreamed is long jump support within > a function. Essentially once a function reaches 1M in size we have the > real possibility that a direct jump may not reach its target. > > To support this I expect that $ra is going to become a fixed register (ie, > not available to the register allocator as a temporary). It'll be used > as a scratch register for long jump sequences. > > One of the consequences of this is $ra will need to be saved in leaf > functions that are near or over 1M in size. > > Note that at the time when we have to lay out the stack, we do not know > the precise length of the function. So there's a degree of "fuzz" in the > decision whether or not to save $ra in a function that is close to the 1M > limit. > > I don't think you can reliably know if $ra is valid in an arbitrary leaf > function or not. You could implement some heuristics by looking at the > symbol table (which I'm guessing you don't want to do) or by > disassembling the prologue (again, I'm guessing you don't want to do that > either). > > Meaning that what you really want is to be using -fno-omit-frame-pointer > and for $ra to always be saved in the stack, even in a leaf function. > > Presumably you're not suggesting any of these options be used in general > -- they're going to be used for things like embedded devices or firmware? > Also note there are low overhead unwinding schemes out there that are > already supported in various tools -- ORC & SFRAME come > immediately to mind. Those may be better than building a bespoke > solution for the embedded space. > > > > Jeff