On 12/03/2014 05:54 PM, David Long wrote: > On 12/03/14 09:54, William Cohen wrote: >> On 12/01/2014 04:37 AM, Masami Hiramatsu wrote: >>> (2014/11/29 1:01), Steve Capper wrote: >>>> On 27 November 2014 at 06:07, Masami Hiramatsu >>>> <masami.hiramatsu...@hitachi.com> wrote: >>>>> (2014/11/27 3:59), Steve Capper wrote: >>>>>> The crash is extremely easy to reproduce. >>>>>> >>>>>> I've not observed any missed events on a kprobe on an arm64 system >>>>>> that's still alive. >>>>>> My (limited!) understanding is that this suggests there could be a >>>>>> problem with how missed events from a recursive call to memcpy are >>>>>> being handled. >>>>> >>>>> I think so too. BTW, could you bisect that? :) >>>>> >>>> >>>> I can't bisect, but the following functions look suspicious to me >>>> (again I'm new to kprobes...): >>>> kprobes_save_local_irqflag >>>> kprobes_restore_local_irqflag >>>> >>>> I think these are breaking somehow when nested (i.e. from a recursive >>>> probe). >>> >>> Agreed. On x86, prev_kprobe has old_flags and saved_flags, this >>> at least must have saved_irqflag and save/restore it in >>> save/restore_previous_kprobe(). >>> >>> What about adding this? >>> >>> struct prev_kprobe { >>> struct kprobe *kp; >>> unsigned int status; >>> + unsigned long saved_irqflag; >>> }; >>> >>> and >>> >>> static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb) >>> { >>> kcb->prev_kprobe.kp = kprobe_running(); >>> kcb->prev_kprobe.status = kcb->kprobe_status; >>> + kcb->prev_kprobe.saved_irqflag = kcb->saved_irqflag; >>> } >>> >>> static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb) >>> { >>> __this_cpu_write(current_kprobe, kcb->prev_kprobe.kp); >>> kcb->kprobe_status = kcb->prev_kprobe.status; >>> + kcb->saved_irqflag = kcb->prev_kprobe.saved_irqflag; >>> } >>> >>> >> >> I have noticed with the aarch64 kprobe patches and recent kernel I can get >> the machine to end up getting stuck and printing out endless strings of >> >> [187694.855843] Unexpected kernel single-step exception at EL1 >> [187694.861385] Unexpected kernel single-step exception at EL1 >> [187694.866926] Unexpected kernel single-step exception at EL1 >> [187694.872467] Unexpected kernel single-step exception at EL1 >> [187694.878009] Unexpected kernel single-step exception at EL1 >> [187694.883550] Unexpected kernel single-step exception at EL1 >> >> I can reproduce this pretty easily on my machine with functioncallcount.stp >> from >> https://sourceware.org/systemtap/examples/profiling/functioncallcount.stp >> and the following steps: >> >> # stap -p4 -k -m mm_probes -w functioncallcount.stp "*@mm/*.c" -c "sleep 1" >> # staprun mm_probes.ko -c "sleep 1" >> >> -Will > > I did a fresh checkout and build of systemtap and tried the above. I'm not > yet seeing this problem. It does remind me of the problem we saw before > debug exception handling in entry.S was patched in v3.18-rc1, but you say you > are using recent kernel sources. >
Hi Dave, I saw this problem with a 3.18.0-rc5 based kernel. Today I built a kernel based on 3.18.0-0.rc6.git0.1.x1 with the patches and I didn't see the problem with the unexpected kernel single-step exception. I am not sure if maybe there was some problem function being probed in the 3.18.0-rc5 kernel but not with the 3.18.0-rc6 kernel or maybe some difference in the config between the kernels. It seemed wiser to mention it. >>> >>> >>>> That would explain why the state of play of the interrupts is in an >>>> unexpected state in the crash I reported: >>>> "The point of failure in the panic was: >>>> fs/buffer.c:1257 >>>> >>>> static inline void check_irqs_on(void) >>>> { >>>> #ifdef irqs_disabled >>>> BUG_ON(irqs_disabled()); >>>> #endif >>>> } >>>> " >>>> >>>> This is all new to me so I'm still at the head-scratching stage. >>> >>> Ah, I see. >>> >>> Thank you, >>> >>>> >>>> David, >>>> Does the above make sense to you? Have you managed to reproduce the crash >>>> I get? >>>> >>>> Cheers, >>>> -- >>>> Steve > > I have easily produced a crash although it doesn't look to me like the same > one. I'm getting a NULL pointer dereference. The PMU stuff (used by perf > record|stat -e) should be quite independent of kprobes though. > > -dl The perf issue seems to be independent and can be reproduced without using any kprobe support. I need to get a simple reproducer and mention it on the linux-perf-user list. -Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/