Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-20 Thread Don Zickus
On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote: > On 08/19/2014 09:36 AM, Chai Wen wrote: > > > On 08/19/2014 04:38 AM, Don Zickus wrote: > > > >> On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: > >>> > >>> * Don

Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > > > > So I agree with the motivation of this improvement, but > > > > > is this implementation namespace-safe? > > > > > > > > What name

Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 08:07:35PM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > On Mon, Aug 18, 2014 at 11:16:44AM +0200, Ingo Molnar wrote: > > > > > > * Don Zickus wrote: > > > > > > > The running kernel still has the abilit

Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 08:01:58PM +0200, Ingo Molnar wrote: > > > > duration = is_softlockup(touch_ts); > > > > if (unlikely(duration)) { > > > > + pid_t pid = task_pid_nr(current); > > > > + > > > > /* > > > > * If a virtual machine i

Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 11:16:44AM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > The running kernel still has the ability to enable/disable at any > > time with /proc/sys/kernel/nmi_watchdog us usual. However even > > when the default has been overridden /p

Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 11:12:39AM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > From: Ulrich Obergfell > > > > In some cases we don't want hard lockup detection enabled by default. > > An example is when running as a guest. Introduce > >

Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote: > * Don Zickus wrote: > > > From: chai wen > > > > For now, soft lockup detector warns once for each case of process > > softlockup. > > But the thread 'watchdog/n' may not always get

Re: [PATCH 3/5] watchdog: fix print-once on enable

2014-08-18 Thread Don Zickus
On Mon, Aug 18, 2014 at 11:07:57AM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -522,6 +522,9 @@ static void watchdog_nmi_disable(unsigned int cpu) > > /* should b

[PATCH 0/5] watchdog: various fixes

2014-08-11 Thread Don Zickus
Just respinning these patches with my sign-off. I keep forgetting which is easier for Andrew to digest (this way or just me replying with an ack). Ulrich Obergfell (3): watchdog: fix print-once on enable watchdog: control hard lockup detection default kvm: ensure hard lockup detection is di

[PATCH 4/5] watchdog: control hard lockup detection default

2014-08-11 Thread Don Zickus
ch of this series. Other hypervisor guest types may find it useful as well. Signed-off-by: Ulrich Obergfell Signed-off-by: Andrew Jones Signed-off-by: Don Zickus --- include/linux/nmi.h |9 + kernel/watchdog.c | 50 -- 2 files change

[PATCH 1/5] watchdog: remove unnecessary head files

2014-08-11 Thread Don Zickus
From: chai wen Signed-off-by: chai wen Signed-off-by: Don Zickus --- kernel/watchdog.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index c3319bd..4c2e11c 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -15,11

[PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-11 Thread Don Zickus
cess", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the pid of the hogging process and use that to reset soft_watchdog_warn too. Signed-off-by: chai wen [modified the comment and changelog to be more specific] Signed-off-by: Don Zickus --

[PATCH 3/5] watchdog: fix print-once on enable

2014-08-11 Thread Don Zickus
== 0 || cpu0_err) pr_info("enabled on all CPUs, ...") The patch avoids this by clearing cpu0_err in watchdog_nmi_disable(). Signed-off-by: Ulrich Obergfell Signed-off-by: Andrew Jones Signed-off-by: Don Zickus --- kernel/watchdog.c |3 +++ 1 files changed, 3 insertions(+), 0 del

[PATCH 5/5] kvm: ensure hard lockup detection is disabled by default

2014-08-11 Thread Don Zickus
ned-off-by: Andrew Jones Signed-off-by: Don Zickus --- arch/x86/kernel/kvm.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..95c3cb1 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -

Re: [PATCH 2/3] watchdog: control hard lockup detection default

2014-07-30 Thread Don Zickus
On Wed, Jul 30, 2014 at 04:16:38PM +0200, Paolo Bonzini wrote: > Il 30/07/2014 15:43, Don Zickus ha scritto: > >> > Nice catch. Looks like this will need a v2. Paolo, do we have a > >> > consensus on the proc echoing? Or should that be revisited in the v2 as > >>

Re: [PATCH 2/3] watchdog: control hard lockup detection default

2014-07-30 Thread Don Zickus
On Fri, Jul 25, 2014 at 01:25:11PM +0200, Andrew Jones wrote: > > to enable hard lockup detection explicitly. > > > > I think changing the 'watchdog_thresh' while 'watchdog_running' is true > > should > > _not_ enable hard lockup detection as a side-effect, because a user may > > have a > > 'sys

Re: [patch 0/2] generic kernel watchdog reset at pvclock read (v2)

2013-10-16 Thread Don Zickus
On Fri, Oct 11, 2013 at 09:39:24PM -0300, Marcelo Tosatti wrote: > v2: > - do not create hung_task.h, move defines to sched.h (Don Zickus) > - switch patch order (Paolo) As long as it solves kvm's problems, I am ok with it. Marcelo, Is there still corner cases out there tha

Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

2013-10-16 Thread Don Zickus
On Wed, Oct 09, 2013 at 06:26:33PM -0300, Marcelo Tosatti wrote: > From https://lkml.org/lkml/2013/7/3/675: > > "Agree. However, can't see how there is a way around "having custom > kvm/paravirt splat all over", for watchdogs that do: > > 1. check for watchdog resets > 2. read time via sched_cloc

Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

2013-10-09 Thread Don Zickus
On Tue, Oct 08, 2013 at 07:08:11PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 08, 2013 at 09:37:05AM -0400, Don Zickus wrote: > > On Mon, Oct 07, 2013 at 10:05:17PM -0300, Marcelo Tosatti wrote: > > > Implement reset of kernel watchdogs at pvclock read time. This avoids > &g

Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

2013-10-08 Thread Don Zickus
> > Suggested by Don Zickus. > > Signed-off-by: Marcelo Tosatti Awesome. Thanks for figuring this out Marcelo. Does that mean we can revert commit 5d1c0f4a now? :-) This meets my expectations. I'll leave it to the virt folks to figure out if this covers all the c

Re: [patch 1/3] hung_task: add method to reset detector

2013-10-08 Thread Don Zickus
On Mon, Oct 07, 2013 at 10:05:16PM -0300, Marcelo Tosatti wrote: > In certain occasions it is possible for a hung task detector > positive to be false: continuation from a paused VM, for example. > > Add a method to reset detection, similar as is done > with other kernel watchdogs. This makes sen

Re: watchdog: print stolen time increment at softlockup detection

2013-07-03 Thread Don Zickus
On Fri, Jun 28, 2013 at 05:37:39PM -0300, Marcelo Tosatti wrote: > On Fri, Jun 28, 2013 at 10:12:15AM -0400, Don Zickus wrote: > > On Thu, Jun 27, 2013 at 11:57:23PM -0300, Marcelo Tosatti wrote: > > > > > > One possibility for a softlockup report in a Linux VM, is th

Re: watchdog: print stolen time increment at softlockup detection

2013-06-28 Thread Don Zickus
On Thu, Jun 27, 2013 at 11:57:23PM -0300, Marcelo Tosatti wrote: > > One possibility for a softlockup report in a Linux VM, is that the host > system is overcommitted to the point where the watchdog task is unable > to make progress (unable to touch the watchdog). I think I am confused on the VM/

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-03-23 Thread Don Zickus
utdown path: foreach_online_cpu cpu_down but I get occasional hangs on reboot that I haven't gotten around to debugging. I assumed this is the approach Peter was suggesting though I don't think he was sure if it was going to be reliable. Cheers, Don > > On Fri, Feb 10, 2012 at 11:04 PM, Do

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus
On Fri, Feb 10, 2012 at 09:36:03PM +0100, Peter Zijlstra wrote: > On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote: > > So my second patch which I will eventually post will just skip the WARN_ON > > if the system is going down. Not sure if that is the proper way to address >

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus
On Fri, Feb 10, 2012 at 09:18:41PM +0100, Peter Zijlstra wrote: > On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote: > > I also ran into the same problem you did and hacked up another patch that > > checked a global atomic variable that let the system know we were shutting > &g

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus
On Fri, Feb 10, 2012 at 08:03:53PM +0100, Peter Zijlstra wrote: > On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote: > > OK, so a 'modern' kernel does it slightly different and I've no idea > > what exactly goes wrong in your vintage version. But I can see the > > current stuff going at it al

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-14 Thread Don Zickus
On Wed, Sep 14, 2011 at 10:00:07AM +0300, Avi Kivity wrote: > On 09/13/2011 10:21 PM, Don Zickus wrote: > >Or are you saying an NMI in an idle system will have the same %rip thus > >falsely detecting a back-to-back NMI? > > > > > > That's easy to avoid -

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-13 Thread Don Zickus
On Tue, Sep 13, 2011 at 09:58:38PM +0200, Andi Kleen wrote: > > Or are you saying an NMI in an idle system will have the same %rip thus > > falsely detecting a back-to-back NMI? > > Yup. Hmm. That sucks. Is there another register that can be used in conjunction to seperate it, like sp or someth

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-13 Thread Don Zickus
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote: > > So I got around to implementing this and it seems to work great. The back > > to back NMIs are detected properly using the %rip and that info is passed to > > the NMI notifier. That info is used to determine if only the first > > han

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-13 Thread Don Zickus
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote: > > So I got around to implementing this and it seems to work great. The back > > to back NMIs are detected properly using the %rip and that info is passed to > > the NMI notifier. That info is used to determine if only the first > > han

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-13 Thread Don Zickus
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote: > >But then the downside > >here is we accidentally handle an NMI that was latched. This would cause > >a 'Dazed on confused' message as that NMI was already handled by the > >previous NMI. > > > >We are working on an algorithm to detect

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-07 Thread Don Zickus
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote: > On 09/07/2011 07:52 PM, Don Zickus wrote: > >> > >> May I ask how? Detecting a back-to-back NMI? > > > >Pretty boring actually. Currently we execute an NMI handler until one of > >them returns

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-07 Thread Don Zickus
On Wed, Sep 07, 2011 at 07:25:24PM +0300, Avi Kivity wrote: > On 09/07/2011 06:56 PM, Don Zickus wrote: > >> > >> And hope that no other NMI was generated while we're handling this > >> one. It's a little... fragile? > > > >No. If another

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-07 Thread Don Zickus
On Wed, Sep 07, 2011 at 07:13:58AM +0300, Avi Kivity wrote: > On 09/06/2011 09:27 PM, Don Zickus wrote: > >On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: > >> >> But, erm, does that even make sense? I'm assuming the NMI reason port > >>

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-07 Thread Don Zickus
On Wed, Sep 07, 2011 at 06:11:14PM +0300, Avi Kivity wrote: > On 09/07/2011 04:44 PM, Don Zickus wrote: > >> > >> Is there a way to tell whether an NMI was internally or externally > >> generated? > >> > >> I don't think so, especially a

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-06 Thread Don Zickus
On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: > >> But, erm, does that even make sense? I'm assuming the NMI reason port > >> tells the CPU why it got an NMI. If multiple CPUs can get NMIs and > >> there's only a single reason port, then doesn't that mean that either 1) > >

Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-06 Thread Don Zickus
On Fri, Sep 02, 2011 at 02:50:53PM -0700, Jeremy Fitzhardinge wrote: > On 09/02/2011 01:47 PM, Peter Zijlstra wrote: > > On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: > >>> I know that its generally considered bad form, but there's at least one > >>> spinlock that's only taken from