On Mon, 23 Dec 2024, Florian Weimer via Gcc wrote:

> * Alexander Monakov:
> 
> > On Mon, 16 Dec 2024, Florian Weimer via Gcc wrote:
> >
> >> I would like to provide a facility to create wrapper functions without
> >> lots of argument shuffling.  To achieve that, the wrapping function and
> >> the wrapped function should have the same prototype.  There will be a
> >> trampoline that puts additional data somewhere (possibly including the
> >> address of the wrapped function, but that interpretation is up to the
> >> wrapping function) and then transfers control to the wrapper function
> >> with an indirect jump (tail call).
> >> 
> >> For signal safety, I think the hidden argument needs to be in a register
> >> (instead of, say, thread-local storage).  Most System V ABI variants
> >> seem to reserve a register for use by the dynamic linker, or for the
> >> static chain pointer of nested functions.
> >> 
> >> Is there a way to reuse either register for this purpose and assign it
> >> to a local variable reliably at the start of the wrapper function
> >> implementation?
> >
> > Not in a way that will work with LLVM, I'm afraid, and with GCC
> > you'll have to shield wrappers from LTO:
> >
> > register void *r10 asm("r10");
> > void f(int, int);
> > void f_wrap(int a, int b)
> > {
> >     r10 = f;
> >     f(a, b);
> > }
> 
> Does this work on all primary GCC targets?

Yes? I'm not sure why you think it might not work. What are your concerns?

I don't think a suitable register is available on all targets, though.
In particular on 32-bit x86 there is none if you consider -mregparm=3,
where eax, edx, ecx are used for passing arguments, and the rest are
callee-saved.

Oh, and my favourite gotcha: indirect tailcalls are not a thing on powerpc64.

> > This is the only approach I'm aware of, apart of generating wrappers
> > in asm (speaking of, is there some reason that wouldn't work for you?).
> 
> You mean wrappers that inject the extra argument?  That doesn't work for
> variadic functions.

Why not? I find that generating a _tailcalling_ wrapper works much better
in asm than in C.

> It's also likely to break with unexpected calling
> conventions.  Variadic functions are always problematic because you
> can't directly forward to the original function.

Erm, what? Surely you can, by performing a tailcall (in asm).

> But at least you can
> write a wrapper that around fprintf that forwards to vfprintf in C,
> without re-implementing fprintf argument parsing (for example).
> 
> If the assembler trampoline only has to store a configurable value in a
> fixed register and do an indirect jump, the trampolines are very
> regular.  This way, it is possible to create new trampolines at run time
> without run-time code generation.  All you need is a pre-built page (or
> a couple of pages) of trampoline code that loads parameter and address
> using PC-relative loads.  These trampoline pages can be mapped multiple
> times next to different data areas.
> 
> Here's the background:
> 
> I'm looking for a possible replacement for the pltenter/pltexit wrappers
> in glibc's audit functionality.  The current approach breaks if
> procedure call standards evolve:

Okay, I guess there was a bit of XY problem at play here :)

As someone who analyzed the LD_AUDIT/LD_PROFILE segfault with IFUNC and
unsuccessfully tried to submit a fix for the related LD_AUDIT slowdown
back in (checks notes) 2013, I am fairly disappointed in how Glibc treated
LD_AUDIT. You've never shown much care for use of LD_AUDIT in FOSS, and
only fixed the slowdown for the benefit of SPINDLE folks. In my opinion,
they wouldn't need LD_AUDIT if their system was somewhat reasonably
designed in the first place.

But okay, if you want to design a next-gen audit replacement, I'm up.
I guess you want to arrange it so that PLT tailcalls the, say, pltenter_v2
hook, which receives the address of the auditee in a static chain register?
This would make it relatively easy to inspect auditee's arguments in the
auditor, even without writing individual audit hooks per each function type
in "reasonable" psABIs. You can even do something like:

typedef void fn(long, long, long, long, long, long);
register fn *r10 asm("r10");
void pltenter_v2(long rdi, long rsi, long rdx, long rcx, long r8, long r9)
{
  // hook actions
  ...
  // invoke the auditee with its original arguments
  [[gnu::musttail]] return r10(rdi, rsi, rdx, rcx, r8, r9);
}

(yes, suggesting to lean on musttail for correctness, intentionally bad advice 
;)

When compiled with -ffixed-r10 -mgeneral-regs-only this should properly preserve
all general-purpose registers in the audit hook, and if hook actions invoke
anything that touches other registers, that becomes the responsibility of
the audit module writers.

By replacing the return address on the stack, this hook can arrange for
the exit hook to be eventually called too, but there's no obvious place
for stashing the original return address, and no easy way to provide the
address of auditee to the exit hook. But, again, that becomes the
responsibility of the audit module writers. Maybe they can use TLS for that.

On the level of C abstract machine, this is plainly UB due to function type
mismatches, of course, so totally doomed from the start.
(not to mention the powerpc64 gotcha)

Alternatively, you can arrange the PLT trampoline so that it tailcalls
the audit hook with a replaced return address:

__attribute__((no_caller_saved_registers))
void pltenter_v2(long rdi, long rsi, long rdx, long rcx, long r8, long r9)
{
  // hook actions
  ...
  // the hook returns normally, preserving all registers
  return;
}

after the hook returns, libc code pushes the original return address, and
tailcalls the auditee. This is relatively cleaner, although at the moment
the attribute is not implemented for all targets. Since the hook is invoked
with the same stack pointer as auditee, it can still inspect all its on-stack
arguments (including variadic arguments).

HTH
Alexander

Reply via email to