On Mon, Jul 13, 2020 at 08:05:49AM +0300, Jarkko Sakkinen wrote: > On Fri, Jul 10, 2020 at 12:49:10PM +0200, Peter Zijlstra wrote: > > On Fri, Jul 10, 2020 at 01:36:38PM +0300, Jarkko Sakkinen wrote: > > > Just so that I know (and learn), what did exactly disable optprobes? > > > > So regular, old-skool style kprobe is: > > > > - copy original instruction out > > - replace instruction with breakpoint (int3 on x86) > > - have exception handler return to the copied instruction with > > single-step on > > - have single step exception handler return to the original > > instruction stream > > > > which is 2 exceptions. > > Out of pure interest, how does it handle a jump (as the original > opcode), given that it single steps a copy?
Good question, I hadn't ever looked at that detail. Anyway, I dug around a little and it disallows 'boosting' (replacing single-step with a jmp) for jump instructions and takes the double exception. It single steps the original jump into 'thin-air' and does a relative fixup of the resulting IP in the single-step exception. For more details also see arch/x86/kernel/kprobes/core.c:resume_execution(). > > optprobes avoid the single-step by not only writing a single > > instruction, but additionally placing a JMP instruction behind it such > > that it will automagically continue in the original instruction stream. > > > > This brings the requirement that the copied instruction is placed > > within the JMP displacement of the regular kernel text (s32 on x86). > > > > module_alloc() ensures the memory provided is within that range. > > Right, a relative jump is placed instead of 0xcc to the breakpoint? So there's all sorts of optimizations. The one I was talking about is apparently called boosting. That one still uses INT3 but avoids (where possible) the single-step #DB trap by appending a JMP.d32 after it. There's also optimized kprobes and that avoids all traps by replacing the original instruction(s) with a JMP.d32 to a trampoline, this trampoline calls the kprobe handler, after which it runs the original instruction(s) and then a JMP.d32 back into where we came from. These fully optimized kprobes have very specific constraints, best to read the code if you want more details. Anyway, the common theme here is that all the various optimizations rely on the out-of-line text being withing the s32 displacement of relative jumps.