(2015/02/27 15:38), David Long wrote:
> On 02/18/15 09:59, Masami Hiramatsu wrote:
>> Hi,
>>
>> (2015/02/18 8:11), David Long wrote:
>>> From: Sandeepa Prabhu <sandeepa.pra...@linaro.org>
>>>
>>> Add support for basic kernel probes(kprobes) and jump probes
>>> (jprobes) for ARM64.
>>>
>>> Kprobes will utilize software breakpoint and single step debug
>>> exceptions supported on ARM v8.
>>>
>>> Software breakpoint is placed at the probe address to trap the
>>> kernel execution into kprobe handler.
>>>
>>> ARM v8 supports single stepping to be enabled while exception return
>>> (ERET) with next PC in exception return address (ELR_EL1). The
>>> kprobe handler prepares an executable memory slot for out-of-line
>>> execution with a copy of the original instruction being probed, and
>>> enables single stepping from the instruction slot. With this scheme,
>>> the instruction is executed with the exact same register context
>>> 'except PC' that points to instruction slot.
>>>
>>> Debug mask(PSTATE.D) is enabled only when single stepping a recursive
>>
>> Is that same as "debug flag" in the code commment?
>>
> 
> Yes. The description in the commit might benefit from a little rewording.
> 
>>> kprobe, e.g.: during kprobes reenter so that probed instruction can be
>>> single stepped within the kprobe handler -exception- context.
>>> The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
>>> any further re-entry is prevented by not calling handlers and the case
>>> counted as a missed kprobe).
>>
>> So, would this mean the debug mask is required for single-stepping
>> in exception context by specification?
>>
> 
> Not positive I exactly understand the question but:
> 
> 1) We can't single-step at all until we've both set the SS mode bit
> (PSTATE.SS) and unmasked debug exceptions (PSTATE.D).  The debug
> exception is normally masked.
> 2) PSTATE.D will be set at exception entry.
> 3) We can't single-step again until we RTE from the current single-step.

Understand, thanks.

> 
>>>
>>> Single stepping from slot has a drawback on PC-relative accesses
>>> like branching and symbolic literals access as offset from new PC
>>> (slot address) may not be ensured to fit in immediate value of
>>> opcode. Such instructions needs simulation, so reject
>>> probing such instructions.
>>
>> BTW, since while the single stepping IRQs are disabled, does that
>> also requires fixups for irq-mask loading instruction or something
>> like that? (sorry, I'm not good at aarch64 ISA...)
> 
> No, we disable interrupts and reject probing of the special instructions
> that could reenable them *or* read the current status (since the
> single-step environment will likely see a different value for that bit).
>  In previous patches I did not reject the reading of this bit.

OK, I see.


>>> Instructions generating exceptions or cpu mode change are rejected,
>>> and not allowed to insert probe for these instructions.
>>>
>>> Instructions using Exclusive Monitor are rejected too.
>>>
>>> System instructions are mostly enabled for stepping, except MSR
>>> immediate that updates "daif" flags in PSTATE, which are not safe
>>> for probing.
>>>
>>> Thanks to Steve Capper and Pratyush Anand for several suggested
>>> Changes.
>>
>> Ok, I have some comments on the code.
>>
[...]
>>> +static void __kprobes set_current_kprobe(struct kprobe *p)
>>> +{
>>> +   __this_cpu_write(current_kprobe, p);
>>> +}
>>> +
>>> +/*
>>> + * Debug flag (D-flag) is disabled upon exception entry.
>>> + * Kprobes need to unmask D-flag -ONLY- in case of recursive
>>> + * probe i.e. when probe hit from kprobe handler context upon
>>> + * executing the pre/post handlers. In this case we return with
>>> + * D-flag unmasked so that single-stepping can be carried-out.
>>> + *
>>> + * Keep D-flag masked in all other cases.
>>
>> Here, we'd better choose D-flag or Debug mask.
>>
> 
> "Debug Mask" and "Debug exception mask" seems to be what the ARM calls
> it.  I can change it to "Debug Mask".

Yeah, that would be better to understand.

> 
>>> + */
>>> +static void __kprobes
>>> +spsr_set_debug_flag(struct pt_regs *regs, int mask)
>>> +{
>>> +   unsigned long spsr = regs->pstate;
>>> +
>>> +   if (mask)
>>> +           spsr |= PSR_D_BIT;
>>> +   else
>>> +           spsr &= ~PSR_D_BIT;
>>> +
>>> +   regs->pstate = spsr;
>>> +}
>>> +
>>> +/*
>>> + * Interrupts need to be disabled before single-step mode is set, and not
>>> + * reenabled until after single-step mode ends.
>>> + * Without disabling interrupt on local CPU, there is a chance of
>>> + * interrupt occurrence in the period of exception return and  start of
>>> + * out-of-line single-step, that result in wrongly single stepping
>>> + * the interrupt handler.
>>> + */
>>> +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs)
>>> +{
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> +   kcb->saved_irqflag = regs->pstate;
>>> +   regs->pstate |= PSR_I_BIT;
>>> +}
>>> +
>>> +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs)
>>> +{
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> +   if (kcb->saved_irqflag & PSR_I_BIT)
>>> +           regs->pstate |= PSR_I_BIT;
>>> +   else
>>> +           regs->pstate &= ~PSR_I_BIT;
>>> +}
>>> +
>>> +static void __kprobes
>>> +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
>>> +{
>>> +   kcb->ss_ctx.ss_status = KPROBES_STEP_PENDING;
>>> +   kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t);
>>> +}
>>> +
>>> +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb)
>>> +{
>>> +   kcb->ss_ctx.ss_status = KPROBES_STEP_NONE;
>>> +   kcb->ss_ctx.match_addr = 0;
>>> +}
>>> +
>>> +static void __kprobes
>>> +skip_singlestep_missed(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
>>> +{
>>> +   /* set return addr to next pc to continue */
>>> +   instruction_pointer(regs) += sizeof(kprobe_opcode_t);
>>> +}
>>> +
>>> +static void __kprobes setup_singlestep(struct kprobe *p,
>>> +                                  struct pt_regs *regs,
>>> +                                  struct kprobe_ctlblk *kcb, int reenter)
>>> +{
>>> +   unsigned long slot;
>>> +
>>> +   if (reenter) {
>>> +           save_previous_kprobe(kcb);
>>> +           set_current_kprobe(p);
>>> +           kcb->kprobe_status = KPROBE_REENTER;
>>> +   } else {
>>> +           kcb->kprobe_status = KPROBE_HIT_SS;
>>> +   }
>>> +
>>> +   if (p->ainsn.insn) {
>>> +           /* prepare for single stepping */
>>> +           slot = (unsigned long)p->ainsn.insn;
>>> +
>>> +           set_ss_context(kcb, slot);      /* mark pending ss */
>>> +
>>> +           if (kcb->kprobe_status == KPROBE_REENTER)
>>> +                   spsr_set_debug_flag(regs, 0);
>>> +
>>> +           /* IRQs and single stepping do not mix well. */
>>> +           kprobes_save_local_irqflag(regs);
>>> +           kernel_enable_single_step(regs);
>>> +           instruction_pointer(regs) = slot;
>>> +   } else  {
>>> +           BUG();
>>> +   }
>>> +}
>>> +
>>> +static int __kprobes reenter_kprobe(struct kprobe *p,
>>> +                               struct pt_regs *regs,
>>> +                               struct kprobe_ctlblk *kcb)
>>> +{
>>> +   switch (kcb->kprobe_status) {
>>> +   case KPROBE_HIT_SSDONE:
>>> +   case KPROBE_HIT_ACTIVE:
>>> +           if (!p->ainsn.check_condn || p->ainsn.check_condn(p, regs)) {
>>> +                   kprobes_inc_nmissed_count(p);
>>> +                   setup_singlestep(p, regs, kcb, 1);
>>> +           } else  {
>>> +                   /* condition check failed, skip stepping */
>>> +                   skip_singlestep_missed(kcb, regs);
>>> +           }
>>> +           break;
>>> +   case KPROBE_HIT_SS:
>>
>> Here, KPROBE_REENTER should be needed.
> 
> Yes.
> 
>> And also, do we really need to fail when KPROBE_HIT_SS case?
>> On x86, I've fixed similar code, since kprobes can hit in single stepping
>> if we put a handler in perf and it can be hit in NMI context, which
>> happens on single-stepping...
>>
> 
> Pratyush at Red Hat has been trying to deal with this for uprobes and
> the best information we have from ARM is that you can't single-step
> while processing a single-step exception.

OK, that is a different point from x86.

Actually, on x86, that can happen if we get NMIs frequently on the
kernel like using perf. Recently I knew the ARM has no NMI, but
fast irq which can be masked. If the perf on ARMv8 uses the fast-irq
without masking, we have to mask it until single-stepping is done.

>>> +           pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr);
>>> +           dump_kprobe(p);
>>> +           BUG();
>>> +           break;
>>> +   default:
>>> +           WARN_ON(1);
>>> +           return 0;
>>> +   }
>>> +
>>> +   return 1;
>>> +}
>>> +
>>> +static void __kprobes
>>> +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
>>> +{
>>> +   struct kprobe *cur = kprobe_running();
>>> +
>>> +   if (!cur)
>>> +           return;
>>> +
>>> +   /* return addr restore if non-branching insn */
>>> +   if (cur->ainsn.restore.type == RESTORE_PC) {
>>> +           instruction_pointer(regs) = cur->ainsn.restore.addr;
>>> +           if (!instruction_pointer(regs))
>>> +                   BUG();
>>> +   }
>>> +
>>> +   /* restore back original saved kprobe variables and continue */
>>> +   if (kcb->kprobe_status == KPROBE_REENTER) {
>>> +           restore_previous_kprobe(kcb);
>>> +           return;
>>> +   }
>>> +   /* call post handler */
>>> +   kcb->kprobe_status = KPROBE_HIT_SSDONE;
>>> +   if (cur->post_handler)  {
>>> +           /* post_handler can hit breakpoint and single step
>>> +            * again, so we enable D-flag for recursive exception.
>>> +            */
>>> +           cur->post_handler(cur, regs, 0);
>>> +   }
>>> +
>>> +   reset_current_kprobe();
>>> +}
>>> +
>>> +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
>>> +{
>>> +   struct kprobe *cur = kprobe_running();
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> +   switch (kcb->kprobe_status) {
>>> +   case KPROBE_HIT_SS:
>>> +   case KPROBE_REENTER:
>>> +           /*
>>> +            * We are here because the instruction being single
>>> +            * stepped caused a page fault. We reset the current
>>> +            * kprobe and the ip points back to the probe address
>>> +            * and allow the page fault handler to continue as a
>>> +            * normal page fault.
>>> +            */
>>> +           instruction_pointer(regs) = (unsigned long)cur->addr;
>>> +           if (!instruction_pointer(regs))
>>> +                   BUG();
>>> +           if (kcb->kprobe_status == KPROBE_REENTER)
>>> +                   restore_previous_kprobe(kcb);
>>> +           else
>>> +                   reset_current_kprobe();
>>> +
>>> +           break;
>>> +   case KPROBE_HIT_ACTIVE:
>>> +   case KPROBE_HIT_SSDONE:
>>> +           /*
>>> +            * We increment the nmissed count for accounting,
>>> +            * we can also use npre/npostfault count for accounting
>>> +            * these specific fault cases.
>>> +            */
>>> +           kprobes_inc_nmissed_count(cur);
>>> +
>>> +           /*
>>> +            * We come here because instructions in the pre/post
>>> +            * handler caused the page_fault, this could happen
>>> +            * if handler tries to access user space by
>>> +            * copy_from_user(), get_user() etc. Let the
>>> +            * user-specified handler try to fix it first.
>>> +            */
>>> +           if (cur->fault_handler && cur->fault_handler(cur, regs, fsr))
>>> +                   return 1;
>>> +
>>> +           /*
>>> +            * In case the user-specified fault handler returned
>>> +            * zero, try to fix up.
>>> +            */
>>> +           if (fixup_exception(regs))
>>> +                   return 1;
>>> +
>>> +           break;
>>> +   }
>>> +   return 0;
>>> +}
>>> +
>>> +int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
>>> +                                  unsigned long val, void *data)
>>> +{
>>> +   return NOTIFY_DONE;
>>> +}
>>> +
>>> +void __kprobes kprobe_handler(struct pt_regs *regs)
>>> +{
>>> +   struct kprobe *p, *cur;
>>> +   struct kprobe_ctlblk *kcb;
>>> +   unsigned long addr = instruction_pointer(regs);
>>> +
>>> +   kcb = get_kprobe_ctlblk();
>>> +   cur = kprobe_running();
>>> +
>>> +   p = get_kprobe((kprobe_opcode_t *) addr);
>>
>> BTW, I'd like to ensure that this handler runs under normal irqs are 
>> disabled.
>> Or, we need preempt_disable() here.
>> # From the same reason, x86 implementation is a bit odd...
> 
> The setup_singlestep() call in there will disable irq's before stepping.
>  Is there something else in there that worries you?

I meant that preempt_disable()s in x86 kprobe handler. Actually, there is
no preempt_disable() in this handler, so I guess this runs under disabed irq.

>>
>>> +
>>> +   if (p) {
>>> +           if (cur) {
>>> +                   if (reenter_kprobe(p, regs, kcb))
>>> +                           return;
>>> +           } else if (!p->ainsn.check_condn ||
>>> +                      p->ainsn.check_condn(p, regs)) {
>>> +                   /* Probe hit and conditional execution check ok. */
>>> +                   set_current_kprobe(p);
>>> +                   kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>>> +
>>> +                   /*
>>> +                    * If we have no pre-handler or it returned 0, we
>>> +                    * continue with normal processing.  If we have a
>>> +                    * pre-handler and it returned non-zero, it prepped
>>> +                    * for calling the break_handler below on re-entry,
>>> +                    * so get out doing nothing more here.
>>> +                    *
>>> +                    * pre_handler can hit a breakpoint and can step thru
>>> +                    * before return, keep PSTATE D-flag enabled until
>>> +                    * pre_handler return back.
>>> +                    */
>>> +                   if (!p->pre_handler || !p->pre_handler(p, regs)) {
>>> +                           kcb->kprobe_status = KPROBE_HIT_SS;
>>> +                           setup_singlestep(p, regs, kcb, 0);
>>> +                           return;
>>> +                   }
>>> +           } else {
>>> +                   /*
>>> +                    * Breakpoint hit but conditional check failed,
>>> +                    * so just skip the instruction (NOP behaviour)
>>> +                    */
>>> +                   skip_singlestep_missed(kcb, regs);
>>> +                   return;
>>> +           }
>>> +   } else if (*(kprobe_opcode_t *) addr != BRK64_OPCODE_KPROBES) {
>>> +           /*
>>> +            * The breakpoint instruction was removed right
>>> +            * after we hit it.  Another cpu has removed
>>> +            * either a probepoint or a debugger breakpoint
>>> +            * at this address.  In either case, no further
>>> +            * handling of this interrupt is appropriate.
>>> +            * Return back to original instruction, and continue.
>>> +            */
>>> +           return;
>>> +   } else if (cur) {
>>> +           /* We probably hit a jprobe.  Call its break handler. */
>>> +           if (cur->break_handler && cur->break_handler(cur, regs)) {
>>> +                   kcb->kprobe_status = KPROBE_HIT_SS;
>>> +                   setup_singlestep(cur, regs, kcb, 0);
>>> +                   return;
>>> +           }
>>> +   } else {
>>> +           /* breakpoint is removed, now in a race
>>> +            * Return back to original instruction & continue.
>>> +            */
>>> +   }
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_ss_hit(struct kprobe_ctlblk *kcb, unsigned long addr)
>>> +{
>>> +   if ((kcb->ss_ctx.ss_status == KPROBES_STEP_PENDING)
>>> +       && (kcb->ss_ctx.match_addr == addr)) {
>>> +           clear_ss_context(kcb);  /* clear pending ss */
>>> +           return DBG_HOOK_HANDLED;
>>> +   }
>>> +   /* not ours, kprobes should ignore it */
>>> +   return DBG_HOOK_ERROR;
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr)
>>> +{
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +   int retval;
>>> +
>>> +   /* return error if this is not our step */
>>> +   retval = kprobe_ss_hit(kcb, instruction_pointer(regs));
>>> +
>>> +   if (retval == DBG_HOOK_HANDLED) {
>>> +           kprobes_restore_local_irqflag(regs);
>>> +           kernel_disable_single_step();
>>> +
>>> +           if (kcb->kprobe_status == KPROBE_REENTER)
>>> +                   spsr_set_debug_flag(regs, 1);
>>> +
>>> +           post_kprobe_handler(kcb, regs);
>>> +   }
>>> +
>>> +   return retval;
>>> +}
>>> +
>>> +static int __kprobes
>>> +kprobe_breakpoint_handler(struct pt_regs *regs, unsigned int esr)
>>> +{
>>> +   kprobe_handler(regs);
>>> +   return DBG_HOOK_HANDLED;
>>> +}
>>> +
>>> +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
>>> +{
>>> +   struct jprobe *jp = container_of(p, struct jprobe, kp);
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +   long stack_ptr = stack_pointer(regs);
>>> +
>>> +   kcb->jprobe_saved_regs = *regs;
>>> +   memcpy(kcb->jprobes_stack, (void *)stack_ptr,
>>> +          MIN_STACK_SIZE(stack_ptr));
>>> +
>>> +   instruction_pointer(regs) = (long)jp->entry;
>>> +   preempt_disable();
>>> +   return 1;
>>> +}
>>> +
>>> +void __kprobes jprobe_return(void)
>>> +{
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +
>>> +   /*
>>> +    * Jprobe handler return by entering break exception,
>>> +    * encoded same as kprobe, but with following conditions
>>> +    * -a magic number in x0 to identify from rest of other kprobes.
>>> +    * -restore stack addr to original saved pt_regs
>>> +    */
>>> +   asm volatile ("ldr x0, [%0]\n\t"
>>> +                 "mov sp, x0\n\t"
>>> +                 "ldr x0, =" __stringify(JPROBES_MAGIC_NUM) "\n\t"
>>> +                 "BRK %1\n\t"
>>> +                 "NOP\n\t"
>>> +                 :
>>> +                 : "r"(&kcb->jprobe_saved_regs.sp),
>>> +                 "I"(BRK64_ESR_KPROBES)
>>> +                 : "memory");
>>> +}
>>> +
>>> +int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
>>> +{
>>> +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>> +   long stack_addr = kcb->jprobe_saved_regs.sp;
>>> +   long orig_sp = stack_pointer(regs);
>>> +   struct jprobe *jp = container_of(p, struct jprobe, kp);
>>> +
>>> +   if (regs->regs[0] == JPROBES_MAGIC_NUM) {
>>> +           if (orig_sp != stack_addr) {
>>> +                   struct pt_regs *saved_regs =
>>> +                       (struct pt_regs *)kcb->jprobe_saved_regs.sp;
>>> +                   pr_err("current sp %lx does not match saved sp %lx\n",
>>> +                          orig_sp, stack_addr);
>>> +                   pr_err("Saved registers for jprobe %p\n", jp);
>>> +                   show_regs(saved_regs);
>>> +                   pr_err("Current registers\n");
>>> +                   show_regs(regs);
>>> +                   BUG();
>>> +           }
>>> +           *regs = kcb->jprobe_saved_regs;
>>> +           memcpy((void *)stack_addr, kcb->jprobes_stack,
>>> +                  MIN_STACK_SIZE(stack_addr));
>>> +           preempt_enable_no_resched();
>>> +           return 1;
>>> +   }
>>> +   return 0;
>>> +}
>>> +
>>> +/* Break Handler hook */
>>> +static struct break_hook kprobes_break_hook = {
>>> +   .esr_mask = BRK64_ESR_MASK,
>>> +   .esr_val = BRK64_ESR_KPROBES,
>>> +   .fn = kprobe_breakpoint_handler,
>>> +};
>>> +
>>> +/* Single Step handler hook */
>>> +static struct step_hook kprobes_step_hook = {
>>> +   .fn = kprobe_single_step_handler,
>>> +};
>>> +
>>> +int __init arch_init_kprobes(void)
>>> +{
>>> +   register_break_hook(&kprobes_break_hook);
>>> +   register_step_hook(&kprobes_step_hook);
>>
>> I really recommend you to not use this "general hook caller" for
>> kprobes, since it is called from anywhere, especially, if you
>> enable lockdep, read_lock can involve so many functions. This
>> means you can fall into the hell of the infinite loop of the
>> breakpoint exceptions...
>>
> 
> OK.  I hope no one will object about any code duplication that results.

Anyway, on x86, we already have kgdb, ftrace and kprobes which are directly
invoked from do_int3...

>> Moreover, no one remove these hook handlers from the list.
>>
> 
> Since this is called from the architecture-independent kprobes code, and
> since that code has no module_exit() code, I don't see a clean way to
> get a call into a new arch_exit_kprobes() which could unregister these
> hooks.  kprobes in general does not appear to be unloadable.

Right.

Thank you,


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to