On Fri, Jan 11, 2019 at 03:48:59PM +0000, Nadav Amit wrote:
> > I liked the idea, BUT, how would it work for callee-saved PV ops?  In
> > that case there's only one clobbered register to work with (rax).
> 
> That’s would be more tricky. How about using a per-CPU trampoline code to
> hold a direct call to the target and temporarily disable preemption (which
> might be simpler by disabling IRQs):
> 
> Static-call modifier:
> 
>         1. synchronize_sched() to ensure per-cpu trampoline is not used
>       2. Patches the jmp in a per-cpu trampoline (see below)
>       3. Saves the call source RIP in [per-cpu scratchpad RIP] (below) 
>       4. Configures the int3 handler to use static-call int3 handler
>       5. Patches the call target (as it currently does).
> 
> Static-call int3 handler:
>       1. Changes flags on the stack to keep IRQs disabled on return
>       2. Jumps to per-cpu trampoline on return
> 
> Per-cpu trampoline:
>       push [per-CPU scratchpad RIP]
>       sti
>       jmp [ target ] (this one is patched)
> 
> Note that no IRQ should be possible between the STI and the JMP due to STI
> blocking.
> 
> What do you say?

This could work, but it's more complex than I was hoping for.

My current leading contender is to do call emulation in the #BP handler,
either by making a gap or by doing Andy's longjmp-style thingie.

-- 
Josh

Reply via email to