On Fri, Jan 11, 2019 at 03:48:59PM +0000, Nadav Amit wrote: > > I liked the idea, BUT, how would it work for callee-saved PV ops? In > > that case there's only one clobbered register to work with (rax). > > That’s would be more tricky. How about using a per-CPU trampoline code to > hold a direct call to the target and temporarily disable preemption (which > might be simpler by disabling IRQs): > > Static-call modifier: > > 1. synchronize_sched() to ensure per-cpu trampoline is not used > 2. Patches the jmp in a per-cpu trampoline (see below) > 3. Saves the call source RIP in [per-cpu scratchpad RIP] (below) > 4. Configures the int3 handler to use static-call int3 handler > 5. Patches the call target (as it currently does). > > Static-call int3 handler: > 1. Changes flags on the stack to keep IRQs disabled on return > 2. Jumps to per-cpu trampoline on return > > Per-cpu trampoline: > push [per-CPU scratchpad RIP] > sti > jmp [ target ] (this one is patched) > > Note that no IRQ should be possible between the STI and the JMP due to STI > blocking. > > What do you say?
This could work, but it's more complex than I was hoping for. My current leading contender is to do call emulation in the #BP handler, either by making a gap or by doing Andy's longjmp-style thingie. -- Josh