On 12/03/2014 05:54 PM, David Long wrote:
> On 12/03/14 09:54, William Cohen wrote:
>> On 12/01/2014 04:37 AM, Masami Hiramatsu wrote:
>>> (2014/11/29 1:01), Steve Capper wrote:
>>>> On 27 November 2014 at 06:07, Masami Hiramatsu
>>>> <masami.hiramatsu...@hitachi.com> wrote:
>>>>> (2014/11/27 3:59), Steve Capper wrote:
>>>>>> The crash is extremely easy to reproduce.
>>>>>>
>>>>>> I've not observed any missed events on a kprobe on an arm64 system
>>>>>> that's still alive.
>>>>>> My (limited!) understanding is that this suggests there could be a
>>>>>> problem with how missed events from a recursive call to memcpy are
>>>>>> being handled.
>>>>>
>>>>> I think so too. BTW, could you bisect that? :)
>>>>>
>>>>
>>>> I can't bisect, but the following functions look suspicious to me
>>>> (again I'm new to kprobes...):
>>>> kprobes_save_local_irqflag
>>>> kprobes_restore_local_irqflag
>>>>
>>>> I think these are breaking somehow when nested (i.e. from a recursive 
>>>> probe).
>>>
>>> Agreed. On x86, prev_kprobe has old_flags and saved_flags, this
>>> at least must have saved_irqflag and save/restore it in
>>> save/restore_previous_kprobe().
>>>
>>> What about adding this?
>>>
>>>   struct prev_kprobe {
>>>       struct kprobe *kp;
>>>       unsigned int status;
>>> +    unsigned long saved_irqflag;
>>>   };
>>>
>>> and
>>>
>>>   static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
>>>   {
>>>       kcb->prev_kprobe.kp = kprobe_running();
>>>       kcb->prev_kprobe.status = kcb->kprobe_status;
>>> +    kcb->prev_kprobe.saved_irqflag = kcb->saved_irqflag;
>>>   }
>>>
>>>   static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
>>>   {
>>>       __this_cpu_write(current_kprobe, kcb->prev_kprobe.kp);
>>>       kcb->kprobe_status = kcb->prev_kprobe.status;
>>> +    kcb->saved_irqflag = kcb->prev_kprobe.saved_irqflag;
>>>   }
>>>
>>>
>>
>> I have noticed with the aarch64 kprobe patches and recent kernel I can get 
>> the machine to end up getting stuck and printing out endless strings of
>>
>> [187694.855843] Unexpected kernel single-step exception at EL1
>> [187694.861385] Unexpected kernel single-step exception at EL1
>> [187694.866926] Unexpected kernel single-step exception at EL1
>> [187694.872467] Unexpected kernel single-step exception at EL1
>> [187694.878009] Unexpected kernel single-step exception at EL1
>> [187694.883550] Unexpected kernel single-step exception at EL1
>>
>> I can reproduce this pretty easily on my machine with functioncallcount.stp 
>> from 
>> https://sourceware.org/systemtap/examples/profiling/functioncallcount.stp 
>> and the following steps:
>>
>> # stap -p4 -k -m mm_probes -w functioncallcount.stp "*@mm/*.c" -c "sleep 1"
>> # staprun mm_probes.ko -c "sleep 1"
>>
>> -Will
> 
> I did a fresh checkout and build of systemtap and tried the above.  I'm not 
> yet seeing this problem.  It does remind me of the problem we saw before 
> debug exception handling in entry.S was patched in v3.18-rc1, but you say you 
> are using recent kernel sources.
> 

Hi Dave,

I saw this problem with a 3.18.0-rc5 based kernel.  Today I built a kernel 
based on  3.18.0-0.rc6.git0.1.x1 with the patches and I didn't see the problem 
with the unexpected kernel single-step exception.  I am not sure if maybe there 
was some problem function being probed in the 3.18.0-rc5 kernel but not with 
the 3.18.0-rc6 kernel or maybe some difference in the config between the 
kernels. It seemed wiser to mention it.

>>>
>>>
>>>> That would explain why the state of play of the interrupts is in an
>>>> unexpected state in the crash I reported:
>>>> "The point of failure in the panic was:
>>>> fs/buffer.c:1257
>>>>
>>>> static inline void check_irqs_on(void)
>>>> {
>>>> #ifdef irqs_disabled
>>>>          BUG_ON(irqs_disabled());
>>>> #endif
>>>> }
>>>> "
>>>>
>>>> This is all new to me so I'm still at the head-scratching stage.
>>>
>>> Ah, I see.
>>>
>>> Thank you,
>>>
>>>>
>>>> David,
>>>> Does the above make sense to you? Have you managed to reproduce the crash 
>>>> I get?
>>>>
>>>> Cheers,
>>>> -- 
>>>> Steve
> 
> I have easily produced a crash although it doesn't look to me like the same 
> one.  I'm getting a NULL pointer dereference.  The PMU stuff (used by perf 
> record|stat -e) should be quite independent of kprobes though.
> 
> -dl

The perf issue seems to be independent and can be reproduced without using any 
kprobe support.  I need to get a simple reproducer and mention it on the 
linux-perf-user list.

-Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to