> On Jan 11, 2019, at 11:03 AM, Linus Torvalds <torva...@linux-foundation.org> > wrote: > > On Fri, Jan 11, 2019 at 7:15 AM Josh Poimboeuf <jpoim...@redhat.com> wrote: >>> Now, in the int3 handler can you take the faulting RIP and search for it in >>> the “static-calls” table, writing the RIP+5 (offset) into R10 (return >>> address) and the target into R11. You make the int3 handler to divert the >>> code execution by changing pt_regs->rip to point to a new function that >>> does: >>> >>> push R10 >>> jmp __x86_indirect_thunk_r11 >>> >>> And then you are done. No? >> >> IIUC, that sounds pretty much like what Steven proposed: >> >> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&data=02%7C01%7Cnamit%40vmware.com%7Cd90f2aee03854a8da5dd08d677f7866e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828302371417454&sdata=KEGVjCZbl5z7cCQX7F0%2FOaWfU7l%2Bvd2YDwZMXuOhpGI%3D&reserved=0 >> >> I liked the idea, BUT, how would it work for callee-saved PV ops? In >> that case there's only one clobbered register to work with (rax). > > Actually, there's a much simpler model now that I think about it. > > The BP fixup just fixes up %rip to to point to "bp_int3_handler". > > And that's just a random text address set up by "text_poke_bp()". > > So how about the static call rewriting simply do this: > > - for each static call: > > 1) create a fixup code stub that does > > push $returnaddressforTHIScall > jmp targetforTHIScall > > 2) do > > on_each_cpu(do_sync_core, NULL, 1); > > to make sure all CPU's see this generated code > > 3) do > > text_poke_bp(addr, newcode, newlen, generatedcode); > > Ta-daa! Done. > > In fact, it turns out that even the extra "do_sync_core()" in #2 is > unnecessary, because taking the BP will be serializing on the CPU that > takes it, so we can skip it. > > End result: the text_poke_bp() function will do the two do_sync_core > IPI's that guarantee that by the time it returns, no other CPU is > using the generated code any more, so it can be re-used for the next > static call fixup. > > Notice? No odd emulation, no need to adjust the stack in the BP > handler, just the regular "return to a different IP". > > Now, there is a nasty special case with that stub, though. > > So nasty thing with the whole "generate a stub for each call" case: > because it's dynamic and because of the re-use of the stub, you could > be in the situation where: > > CPU1 CPU2 > ---- ---- > > generate a stub > on_each_cpu(do_sync_core..) > text_poke_bp() > ... > > rewrite to BP > trigger the BP > return to the stub > fun the first instruction of the stub > *INTERRUPT causes rescheduling* > > on_each_cpu(do_sync_core..) > rewrite to good instruction > on_each_cpu(do_sync_core..) > > free or re-generate the stub > > !! The stub is still in use !! > > So that simple "just generate the stub dynamically" isn't so simple after all. > > But it turns out that that is really simple to handle too. How do we do that? > > We do that by giving the BP handler *two* code sequences, and we make > the BP handler pick one depending on whether it is returning to a > "interrupts disabled" or "interrupts enabled" case. > > So the BP handler does this: > > - if we're returning with interrupts disabled, pick the simple stub > > - if we're returning with interrupts enabled, clkear IF in the return > %rflags, and pick a *slightly* more complex stub: > > push $returnaddressforTHIScall > sti > jmp targetforTHIScall > > and now the STI shadow will mean that this sequence is uninterruptible. > > So we'd not do complex emulation of the call instruction at BP time, > but we'd do that *trivial* change at BP time. > > This seems simple, doesn't need any temporary registers at all, and > doesn't need any extra stack magic. It literally needs just a trivial > sequence in poke_int3_handler(). > > The we'd change the end of poke_int3_handler() to do something like > this instead: > > void *newip = bp_int3_handler; > .. > if (new == magic_static_call_bp_int3_handler) { > if (regs->flags &X86_FLAGS_IF) { > newip = magic_static_call_bp_int3_handler_sti; > regs->flags &= ~X86_FLAGS_IF; > } > regs->ip = (unsigned long) newip; > return 1; > > AAND now we're *really* done. > > Does anybody see any issues in this?
I think it’s a better articulated, more detailed version to the solution I proposed in this thread (which also used a patched trampoline with STI+JMP). So obviously I like it. ;-)