> On Jan 11, 2019, at 7:15 AM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> 
> On Fri, Jan 11, 2019 at 01:47:01AM +0000, Nadav Amit wrote:
>> Here is an alternative idea (although similar to Steven’s and my code).
>> 
>> Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow
>> should be done by the calling convention (and gcc plugin should allow us to
>> enforce). Also assume that we hold a table with all source RIP and the
>> matching target.
>> 
>> Now, in the int3 handler can you take the faulting RIP and search for it in
>> the “static-calls” table, writing the RIP+5 (offset) into R10 (return
>> address) and the target into R11. You make the int3 handler to divert the
>> code execution by changing pt_regs->rip to point to a new function that does:
>> 
>>      push R10
>>      jmp __x86_indirect_thunk_r11
>> 
>> And then you are done. No?
> 
> IIUC, that sounds pretty much like what Steven proposed:
> 
>  
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&amp;data=02%7C01%7Cnamit%40vmware.com%7Ce3f0b96a1e83417af48808d677d7a147%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828165370908292&amp;sdata=PFzrJQzoa21IRYmEuqHSSGYrNZt0zIo8TGOZa3NWbOE%3D&amp;reserved=0

Stupid me. I’ve remembered it slightly different (the caller saving the
target in a register).

> I liked the idea, BUT, how would it work for callee-saved PV ops?  In
> that case there's only one clobbered register to work with (rax).

That’s would be more tricky. How about using a per-CPU trampoline code to
hold a direct call to the target and temporarily disable preemption (which
might be simpler by disabling IRQs):

Static-call modifier:

        1. synchronize_sched() to ensure per-cpu trampoline is not used
        2. Patches the jmp in a per-cpu trampoline (see below)
        3. Saves the call source RIP in [per-cpu scratchpad RIP] (below) 
        4. Configures the int3 handler to use static-call int3 handler
        5. Patches the call target (as it currently does).

Static-call int3 handler:
        1. Changes flags on the stack to keep IRQs disabled on return
        2. Jumps to per-cpu trampoline on return

Per-cpu trampoline:
        push [per-CPU scratchpad RIP]
        sti
        jmp [ target ] (this one is patched)

Note that no IRQ should be possible between the STI and the JMP due to STI
blocking.

What do you say?

Reply via email to