> -----Original Message----- > From: ccr...@google.com [mailto:ccr...@google.com] On Behalf Of Colin > Cross > Sent: Friday, January 11, 2013 1:34 PM > To: Liu, Chuansheng > Cc: linux-kernel@vger.kernel.org; Andrew Morton; Don Zickus; Ingo Molnar; > Thomas Gleixner; linux-arm-ker...@lists.infradead.org > Subject: Re: [PATCH] hardlockup: detect hard lockups without NMIs using > secondary cpus > > On Thu, Jan 10, 2013 at 5:39 PM, Liu, Chuansheng > <chuansheng....@intel.com> wrote: > > > > > >> -----Original Message----- > >> From: Colin Cross [mailto:ccr...@android.com] > >> Sent: Thursday, January 10, 2013 9:58 AM > >> To: linux-kernel@vger.kernel.org > >> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu, > >> Chuansheng; linux-arm-ker...@lists.infradead.org; Colin Cross > >> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using > >> secondary cpus > >> > >> Emulate NMIs on systems where they are not available by using timer > >> interrupts on other cpus. Each cpu will use its softlockup hrtimer > >> to check that the next cpu is processing hrtimer interrupts by > >> verifying that a counter is increasing. > >> > >> This patch is useful on systems where the hardlockup detector is not > >> available due to a lack of NMIs, for example most ARM SoCs. > >> Without this patch any cpu stuck with interrupts disabled can > >> cause a hardware watchdog reset with no debugging information, > >> but with this patch the kernel can detect the lockup and panic, > >> which can result in useful debugging info. > >> > >> Signed-off-by: Colin Cross <ccr...@android.com> > >> +static void watchdog_check_hardlockup_other_cpu(void) > >> +{ > >> + int cpu; > >> + cpumask_t cpus = watchdog_cpus; > >> + > >> + /* > >> + * Test for hardlockups every 3 samples. The sample period is > >> + * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly > over > >> + * watchdog_thresh (over by 20%). > >> + */ > >> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0) > >> + return; > >> + Another feeling is about __this_cpu_read(hrtimer_interrupts) % 3 != 0, It will cause the actual timeout value for hard lockup detection is not very fix, or even very short. Sometimes using 3 samples can detect the lockup case, but sometimes 1 sample. Is it the case?
And in NMI case, the NMI interrupt is coming at least every watchdog_thresh. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/