On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote:
> On 08/19/2014 09:36 AM, Chai Wen wrote:
>
> > On 08/19/2014 04:38 AM, Don Zickus wrote:
> >
> >> On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote:
> >>>
> >>> * Don
On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > > > > So I agree with the motivation of this improvement, but
> > > > > is this implementation namespace-safe?
> > > >
> > > > What name
On Mon, Aug 18, 2014 at 08:07:35PM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > On Mon, Aug 18, 2014 at 11:16:44AM +0200, Ingo Molnar wrote:
> > >
> > > * Don Zickus wrote:
> > >
> > > > The running kernel still has the abilit
On Mon, Aug 18, 2014 at 08:01:58PM +0200, Ingo Molnar wrote:
> > > > duration = is_softlockup(touch_ts);
> > > > if (unlikely(duration)) {
> > > > + pid_t pid = task_pid_nr(current);
> > > > +
> > > > /*
> > > > * If a virtual machine i
On Mon, Aug 18, 2014 at 11:16:44AM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > The running kernel still has the ability to enable/disable at any
> > time with /proc/sys/kernel/nmi_watchdog us usual. However even
> > when the default has been overridden /p
On Mon, Aug 18, 2014 at 11:12:39AM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > From: Ulrich Obergfell
> >
> > In some cases we don't want hard lockup detection enabled by default.
> > An example is when running as a guest. Introduce
> >
On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote:
> * Don Zickus wrote:
>
> > From: chai wen
> >
> > For now, soft lockup detector warns once for each case of process
> > softlockup.
> > But the thread 'watchdog/n' may not always get
On Mon, Aug 18, 2014 at 11:07:57AM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -522,6 +522,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
> > /* should b
Just respinning these patches with my sign-off. I keep forgetting which is
easier for Andrew to digest (this way or just me replying with an ack).
Ulrich Obergfell (3):
watchdog: fix print-once on enable
watchdog: control hard lockup detection default
kvm: ensure hard lockup detection is di
ch
of this series. Other hypervisor guest types may find it useful as
well.
Signed-off-by: Ulrich Obergfell
Signed-off-by: Andrew Jones
Signed-off-by: Don Zickus
---
include/linux/nmi.h |9 +
kernel/watchdog.c | 50 --
2 files change
From: chai wen
Signed-off-by: chai wen
Signed-off-by: Don Zickus
---
kernel/watchdog.c |5 -
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c3319bd..4c2e11c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -15,11
cess", as there may
be a different process that is going to hog the cpu. Resolve this by
saving/checking the pid of the hogging process and use that to reset
soft_watchdog_warn too.
Signed-off-by: chai wen
[modified the comment and changelog to be more specific]
Signed-off-by: Don Zickus
--
== 0 || cpu0_err)
pr_info("enabled on all CPUs, ...")
The patch avoids this by clearing cpu0_err in watchdog_nmi_disable().
Signed-off-by: Ulrich Obergfell
Signed-off-by: Andrew Jones
Signed-off-by: Don Zickus
---
kernel/watchdog.c |3 +++
1 files changed, 3 insertions(+), 0 del
ned-off-by: Andrew Jones
Signed-off-by: Don Zickus
---
arch/x86/kernel/kvm.c |8
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..95c3cb1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -
On Wed, Jul 30, 2014 at 04:16:38PM +0200, Paolo Bonzini wrote:
> Il 30/07/2014 15:43, Don Zickus ha scritto:
> >> > Nice catch. Looks like this will need a v2. Paolo, do we have a
> >> > consensus on the proc echoing? Or should that be revisited in the v2 as
> >>
On Fri, Jul 25, 2014 at 01:25:11PM +0200, Andrew Jones wrote:
> > to enable hard lockup detection explicitly.
> >
> > I think changing the 'watchdog_thresh' while 'watchdog_running' is true
> > should
> > _not_ enable hard lockup detection as a side-effect, because a user may
> > have a
> > 'sys
On Fri, Oct 11, 2013 at 09:39:24PM -0300, Marcelo Tosatti wrote:
> v2:
> - do not create hung_task.h, move defines to sched.h (Don Zickus)
> - switch patch order (Paolo)
As long as it solves kvm's problems, I am ok with it.
Marcelo,
Is there still corner cases out there tha
On Wed, Oct 09, 2013 at 06:26:33PM -0300, Marcelo Tosatti wrote:
> From https://lkml.org/lkml/2013/7/3/675:
>
> "Agree. However, can't see how there is a way around "having custom
> kvm/paravirt splat all over", for watchdogs that do:
>
> 1. check for watchdog resets
> 2. read time via sched_cloc
On Tue, Oct 08, 2013 at 07:08:11PM -0300, Marcelo Tosatti wrote:
> On Tue, Oct 08, 2013 at 09:37:05AM -0400, Don Zickus wrote:
> > On Mon, Oct 07, 2013 at 10:05:17PM -0300, Marcelo Tosatti wrote:
> > > Implement reset of kernel watchdogs at pvclock read time. This avoids
> &g
>
> Suggested by Don Zickus.
>
> Signed-off-by: Marcelo Tosatti
Awesome. Thanks for figuring this out Marcelo. Does that mean we can
revert commit 5d1c0f4a now? :-)
This meets my expectations. I'll leave it to the virt folks to figure out
if this covers all the c
On Mon, Oct 07, 2013 at 10:05:16PM -0300, Marcelo Tosatti wrote:
> In certain occasions it is possible for a hung task detector
> positive to be false: continuation from a paused VM, for example.
>
> Add a method to reset detection, similar as is done
> with other kernel watchdogs.
This makes sen
On Fri, Jun 28, 2013 at 05:37:39PM -0300, Marcelo Tosatti wrote:
> On Fri, Jun 28, 2013 at 10:12:15AM -0400, Don Zickus wrote:
> > On Thu, Jun 27, 2013 at 11:57:23PM -0300, Marcelo Tosatti wrote:
> > >
> > > One possibility for a softlockup report in a Linux VM, is th
On Thu, Jun 27, 2013 at 11:57:23PM -0300, Marcelo Tosatti wrote:
>
> One possibility for a softlockup report in a Linux VM, is that the host
> system is overcommitted to the point where the watchdog task is unable
> to make progress (unable to touch the watchdog).
I think I am confused on the VM/
utdown path:
foreach_online_cpu
cpu_down
but I get occasional hangs on reboot that I haven't gotten around to
debugging. I assumed this is the approach Peter was suggesting though I
don't think he was sure if it was going to be reliable.
Cheers,
Don
>
> On Fri, Feb 10, 2012 at 11:04 PM, Do
On Fri, Feb 10, 2012 at 09:36:03PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote:
> > So my second patch which I will eventually post will just skip the WARN_ON
> > if the system is going down. Not sure if that is the proper way to address
>
On Fri, Feb 10, 2012 at 09:18:41PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote:
> > I also ran into the same problem you did and hacked up another patch that
> > checked a global atomic variable that let the system know we were shutting
> &g
On Fri, Feb 10, 2012 at 08:03:53PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote:
> > OK, so a 'modern' kernel does it slightly different and I've no idea
> > what exactly goes wrong in your vintage version. But I can see the
> > current stuff going at it al
On Wed, Sep 14, 2011 at 10:00:07AM +0300, Avi Kivity wrote:
> On 09/13/2011 10:21 PM, Don Zickus wrote:
> >Or are you saying an NMI in an idle system will have the same %rip thus
> >falsely detecting a back-to-back NMI?
> >
> >
>
> That's easy to avoid -
On Tue, Sep 13, 2011 at 09:58:38PM +0200, Andi Kleen wrote:
> > Or are you saying an NMI in an idle system will have the same %rip thus
> > falsely detecting a back-to-back NMI?
>
> Yup.
Hmm. That sucks. Is there another register that can be used in
conjunction to seperate it, like sp or someth
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote:
> > So I got around to implementing this and it seems to work great. The back
> > to back NMIs are detected properly using the %rip and that info is passed to
> > the NMI notifier. That info is used to determine if only the first
> > han
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote:
> > So I got around to implementing this and it seems to work great. The back
> > to back NMIs are detected properly using the %rip and that info is passed to
> > the NMI notifier. That info is used to determine if only the first
> > han
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote:
> >But then the downside
> >here is we accidentally handle an NMI that was latched. This would cause
> >a 'Dazed on confused' message as that NMI was already handled by the
> >previous NMI.
> >
> >We are working on an algorithm to detect
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote:
> On 09/07/2011 07:52 PM, Don Zickus wrote:
> >>
> >> May I ask how? Detecting a back-to-back NMI?
> >
> >Pretty boring actually. Currently we execute an NMI handler until one of
> >them returns
On Wed, Sep 07, 2011 at 07:25:24PM +0300, Avi Kivity wrote:
> On 09/07/2011 06:56 PM, Don Zickus wrote:
> >>
> >> And hope that no other NMI was generated while we're handling this
> >> one. It's a little... fragile?
> >
> >No. If another
On Wed, Sep 07, 2011 at 07:13:58AM +0300, Avi Kivity wrote:
> On 09/06/2011 09:27 PM, Don Zickus wrote:
> >On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote:
> >> >> But, erm, does that even make sense? I'm assuming the NMI reason port
> >>
On Wed, Sep 07, 2011 at 06:11:14PM +0300, Avi Kivity wrote:
> On 09/07/2011 04:44 PM, Don Zickus wrote:
> >>
> >> Is there a way to tell whether an NMI was internally or externally
> >> generated?
> >>
> >> I don't think so, especially a
On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote:
> >> But, erm, does that even make sense? I'm assuming the NMI reason port
> >> tells the CPU why it got an NMI. If multiple CPUs can get NMIs and
> >> there's only a single reason port, then doesn't that mean that either 1)
> >
On Fri, Sep 02, 2011 at 02:50:53PM -0700, Jeremy Fitzhardinge wrote:
> On 09/02/2011 01:47 PM, Peter Zijlstra wrote:
> > On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote:
> >>> I know that its generally considered bad form, but there's at least one
> >>> spinlock that's only taken from
38 matches
Mail list logo