Hi,

Let me make some comments from the kernel side.

On Thu, Apr 28, 2016 at 11:58:25AM +0100, Szabolcs Nagy wrote:
> On 28/04/16 09:47, Maxim Kuvyrkov wrote:
> >> On Apr 27, 2016, at 7:26 PM, Szabolcs Nagy <szabolcs.n...@arm.com> wrote:
> >>
> >> with -mfentry, by default the user only has to
> >> implement the fentry call (linux wants nops there, but
> >> e.g. glibc could use -pg -mfentry for profiling on
> >> aarch64 and the target specific details are easier to
> >> document for an -m option than for something general).
> > 
> > I don't understand your point here, could you elaborate, please?
> > 
> 
> if we only provide -mfentry then
> 
> - the kernel can use it (they have tools to nop patch the binary),

Do you mean scripts/recordmcount.c,.pl?
This tool is intended to generate __mcount_loc section, which contains
a list of locations of callsites of mcount/fentry, and won't make any
changes to the kernel binary.

> - others who don't want to fiddle with nops, just have the call,
> can also use it (e.g. user-space profiling cannot really use
> something that needs binary patching in case the user prefers
> -pg -mfentry over the current -pg behaviour).

Well, -mfentry is simple and perfect on x86, but seems to be not best-fit
to arm, thinking that -mfentry means that it inserts a callsite at the very
beginning of a function. See a thread of discussions about -mfentry on arm64.

> - it's target specific, so the magic abi of the fentry call can
> be documented by the target according to the specific instruction
> sequence that is used. (with nop-padding there are psabi and
> compiler optimization interactions that may be hard to document
> in a generic way and letting the user figure it out may cause
> problems later in compiler development.. but i'm just speculating
> based on the powerpc toc handling and ipa-ra findings.)
> 
> >> the nop-padding is more general, but the size and
> >> layout of nops and the call abi will be target
> >> specific and the user will most likely need to modify
> >> the binary (to get the right sequence) which needs
> >> additional tooling.  i don't know who might use it
> >> other than linux (which already has tools to deal with
> >> -mfentry).

Please note that code-patching(/nop-padding) is totally up to the kernel
and arch-specific code. The kernel will do that either
- at the initialization of kernel ftrace, or
- at runtime dynamically by user's instructions (through sysfs)

The tool (recordmcount) will never interact with the kernel at runtime.

> > 
> > Right, but this tooling will require minimal (if any) changes
> > to be adapted to nop-pad approach.  If I remember correctly,
> > recent versions of GCC and kernel for x86_64 generate NOPs,
> > not the call sequence in the prologs when -mfentry is used.

I think that Maxim mentioned the following x86-specific gcc options:
- -mrecord-mcount
- -mnop-mcount
but as far as I checked, the current kernel does not utilizes these
options.

> i'm trying to find where this happens in the kernel, but
> i only see scripts/recordmcount.{c,pl} which are based on
> nop patching the fentry/mcount call sites.
> 
> without such call sites the tools have to be implemented
> differently and the way the kernel records the call site
> positions might not match the prolog-pad recording.

Where the callsite resides in a given nop sequence will depend on arch,
but again, this issue can be handled by arch-specific code.

Thanks,
-Takahiro AKASHI

Reply via email to