On Fri, 17 Aug 2007 15:43:04 -0400 Masami Hiramatsu <[EMAIL PROTECTED]> wrote:
> This patch introduces architecture dependent kretprobe > blacklists to prohibit users from inserting return > probes on the function in which kprobes can be inserted > but kretprobes can not. > > Signed-off-by: Masami Hiramatsu <[EMAIL PROTECTED]> > > --- > <Problem Description> > When a kretprobe is inserted in the entry of the "__switch_to()", > it causes kernel panic on i386 with recent kernel. > > <Problem Analysis> > In include/asm-i386/current.h, "current" is defined as an entry of > percpu variables.: > > DECLARE_PER_CPU(struct task_struct *, current_task); > static __always_inline struct task_struct *get_current(void) > { > return x86_read_percpu(current_task); > } > > #define current get_current() > > This mean the "current" macro is separated from its stack register. > Kretprobe expects that "current" value when a function is called is > not changed, or both of "current" value and stack register are changed in > the target function. > But __switch_to() in arch/i386/kernel/process.c changes only the value of > "current", but doesn't changes the stack register(this was already switched > to new stack before calling __switch_to()): > > struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct > task_struct *next_p) > { > ... > x86_write_percpu(current_task, next_p); > > return prev_p; > } > > Kretprobe modifies the return address stored in the stack, and > it saves the original return address in the kretprobe_instance with > the value of "current". > When kretprobe is hit at the return of __switch_to(), it searches > kretprobe_instance related to the probe point from a hash list by > using the hash-value of the "current" instead of stack address. > However, since the value of "current" is already changed, > it can't find proper instance, and invokes BUG() macro. > > <Solution> > As a result of discussion with other kprobe developers, I'd > like to introduce arch-dependent blacklist. > To introduce "__kretprobes" mark (like "__kprobes") is another way. > But I thought it is not efficient way, because the kretprobe can > be inserted only in the entry of functions and there is no need to > check against a whole function. > > This patch also removes "__kprobes" mark from "__switch_to" on x86_64 > and registers "__switch_to" to the blacklist on x86-64, because that mark > is to prohibit user from inserting only kretprobe. > > Index: 2.6-mm/include/asm-i386/kprobes.h > =================================================================== > --- 2.6-mm.orig/include/asm-i386/kprobes.h > +++ 2.6-mm/include/asm-i386/kprobes.h > @@ -44,6 +44,7 @@ typedef u8 kprobe_opcode_t; > > #define ARCH_SUPPORTS_KRETPROBES > #define flush_insn_slot(p) do { } while (0) > +#define ARCH_SUPPORTS_KRETPROBE_BLACKLIST Can we avoid adding this please? It should at least have been a CONFIG_foo thing, defined in arch/*/Kconfig. But that still requires nasty ifdefs in the C code. It would be very small overhead just to require that all architectures implement arch_kretprobe_blacklist[] (which can be renamed to kretprobe_blacklist[]). Architectures which don't need a blacklist can just have { { 0, 0 } }. If the few bytes of overhead on non-x86 really offends then one could do something like this: in powerpc header file: #define kretprobe_blacklist_size 0 in x86 header file: extern const int kretprobe_blacklist_size; in x86 C file: const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist); and then this code: > --- 2.6-mm.orig/kernel/kprobes.c > +++ 2.6-mm/kernel/kprobes.c > @@ -716,6 +716,18 @@ int __kprobes register_kretprobe(struct > int ret = 0; > struct kretprobe_instance *inst; > int i; > +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST > + void *addr = rp->kp.addr; > + > + if (addr == NULL) > + kprobe_lookup_name(rp->kp.symbol_name, addr); > + addr += rp->kp.offset; > + > + for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) { > + if (arch_kretprobe_blacklist[i].addr == addr) > + return -EINVAL; > + } > +#endif can be put inside if (kretprobe_blacklist_size) { ... } so the compiler will remove it all for (say) powerpc. There are lots of ways of doing it but code like this: > > +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST > + /* lookup the function address from its name */ > + for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) { > + kprobe_lookup_name(arch_kretprobe_blacklist[i].name, > + arch_kretprobe_blacklist[i].addr); > + if (!arch_kretprobe_blacklist[i].addr) > + printk("kretprobe: Unknown blacklisted function: %s\n", > + arch_kretprobe_blacklist[i].name); > + } > +#endif really isn't the sort of thing we like to see spreading through core kernel code. Have a think about it please, see what we can come up with? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/