* Alexander Monakov: >> > Not in a way that will work with LLVM, I'm afraid, and with GCC >> > you'll have to shield wrappers from LTO: >> > >> > register void *r10 asm("r10"); >> > void f(int, int); >> > void f_wrap(int a, int b) >> > { >> > r10 = f; >> > f(a, b); >> > } >> >> Does this work on all primary GCC targets? > > Yes? I'm not sure why you think it might not work. What are your > concerns?
Targets not setting up REGISTER_NAMES, not mentioning the static chain pointer register in it (not sure if that is even logically possible). > I don't think a suitable register is available on all targets, though. > In particular on 32-bit x86 there is none if you consider -mregparm=3, > where eax, edx, ecx are used for passing arguments, and the rest are > callee-saved. > > Oh, and my favourite gotcha: indirect tailcalls are not a thing on > powerpc64. Well, obviously the current pltenter and pltexit hooks do not work with tail calls anywhere, and not just on powerpc64le. And we still manage to do a tail call from _dl_runtime_resolve to the actual implementation, after obtaining the address via _dl_fixup. We'd want to use a special implementation for powerpc64 anyway because it doesn't need trampolines. >> > This is the only approach I'm aware of, apart of generating wrappers >> > in asm (speaking of, is there some reason that wouldn't work for you?). >> >> You mean wrappers that inject the extra argument? That doesn't work for >> variadic functions. > > Why not? I find that generating a _tailcalling_ wrapper works much better > in asm than in C. Sure, but tail-calling wrappers are not really wrappers, they are a bit too one-sided for that. >> It's also likely to break with unexpected calling >> conventions. Variadic functions are always problematic because you >> can't directly forward to the original function. > > Erm, what? Surely you can, by performing a tailcall (in asm). But that's about it. You can't actually wrap the call (running your own code before and after it) because the original argument area won't be at the right offset, and working around that in a generic fashion requires really weird tricks. At this point you are likely better offer with uprobes and similar tools. >> I'm looking for a possible replacement for the pltenter/pltexit wrappers >> in glibc's audit functionality. The current approach breaks if >> procedure call standards evolve: > > Okay, I guess there was a bit of XY problem at play here :) Not really. > But okay, if you want to design a next-gen audit replacement, I'm up. > I guess you want to arrange it so that PLT tailcalls the, say, pltenter_v2 > hook, which receives the address of the auditee in a static chain register? > This would make it relatively easy to inspect auditee's arguments in the > auditor, even without writing individual audit hooks per each function type > in "reasonable" psABIs. You can even do something like: > > typedef void fn(long, long, long, long, long, long); > register fn *r10 asm("r10"); > void pltenter_v2(long rdi, long rsi, long rdx, long rcx, long r8, long r9) > { > // hook actions > ... > // invoke the auditee with its original arguments > [[gnu::musttail]] return r10(rdi, rsi, rdx, rcx, r8, r9); > } I think the missing piece of context here is that I've given up on the idea of signature-independent wrappers. It seems that in practice (outside of tools like sotruss/latrace, and those don't really work and are largely unmaintained), the type signature of the wrapped functions is known quite well, and LD_PRELOAD could be used if not for the multiple instances problem mentioned below. With the currently implemented pltenter/pltexit approach (and what your are proposing above), that's not really true because the wrapper is extremely architecture-specific. And if you know the signature and write the wrapper in C, the need for -mgeneral-regs-only goes away, too (unless you are trying to wrap a function that was written using a PCS that your current toolchain doesn't support, ugh). LD_AUDIT offers outright replacement of the address for this scenario, but it breaks down if the targeted shared object is loaded multiple times (either literally into different dynamic linker namespaces, or conceptually, with different sonames). This is where LD_PRELOAD wrapping becomes problematic, true. With address replacement, there is just no contextual information at all provided by the dynamic linker. If you need it, you need to generate your own trampoline. And then we end up with the question how the trampoline can forward the information to the wrapping function. (Without multiple instances, you could just store the real implementation address and information about the shared object or the implementation in global variables.) > By replacing the return address on the stack, this hook can arrange for > the exit hook to be eventually called too, but there's no obvious place > for stashing the original return address, and no easy way to provide the > address of auditee to the exit hook. But, again, that becomes the > responsibility of the audit module writers. Maybe they can use TLS for that. We'd need to wrap all signal handlers so that they don't clobber that TLS data by accident. I really do not want to go there. Thanks, Florian