> On Jan 11, 2019, at 7:15 AM, Josh Poimboeuf <jpoim...@redhat.com> wrote: > > On Fri, Jan 11, 2019 at 01:47:01AM +0000, Nadav Amit wrote: >> Here is an alternative idea (although similar to Steven’s and my code). >> >> Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow >> should be done by the calling convention (and gcc plugin should allow us to >> enforce). Also assume that we hold a table with all source RIP and the >> matching target. >> >> Now, in the int3 handler can you take the faulting RIP and search for it in >> the “static-calls” table, writing the RIP+5 (offset) into R10 (return >> address) and the target into R11. You make the int3 handler to divert the >> code execution by changing pt_regs->rip to point to a new function that does: >> >> push R10 >> jmp __x86_indirect_thunk_r11 >> >> And then you are done. No? > > IIUC, that sounds pretty much like what Steven proposed: > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&data=02%7C01%7Cnamit%40vmware.com%7Ce3f0b96a1e83417af48808d677d7a147%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828165370908292&sdata=PFzrJQzoa21IRYmEuqHSSGYrNZt0zIo8TGOZa3NWbOE%3D&reserved=0
Stupid me. I’ve remembered it slightly different (the caller saving the target in a register). > I liked the idea, BUT, how would it work for callee-saved PV ops? In > that case there's only one clobbered register to work with (rax). That’s would be more tricky. How about using a per-CPU trampoline code to hold a direct call to the target and temporarily disable preemption (which might be simpler by disabling IRQs): Static-call modifier: 1. synchronize_sched() to ensure per-cpu trampoline is not used 2. Patches the jmp in a per-cpu trampoline (see below) 3. Saves the call source RIP in [per-cpu scratchpad RIP] (below) 4. Configures the int3 handler to use static-call int3 handler 5. Patches the call target (as it currently does). Static-call int3 handler: 1. Changes flags on the stack to keep IRQs disabled on return 2. Jumps to per-cpu trampoline on return Per-cpu trampoline: push [per-CPU scratchpad RIP] sti jmp [ target ] (this one is patched) Note that no IRQ should be possible between the STI and the JMP due to STI blocking. What do you say?