Re: [PATCH] tty/vt: Touch NMI watchdog in vt_console_print

2019-10-04 Thread Greg KH
On Fri, Sep 20, 2019 at 04:57:26PM +0800, Qiujun Huang wrote: > vt_console_print could trigger NMI watchdog in case writing slow: > > [2858736.789664] NMI watchdog: Watchdog detected hard LOCKUP on cpu 23 > ... > [2858736.790194] CPU: 23 PID: 32504 Comm: tensorflow_mode Not taint

[PATCH] tty/vt: Touch NMI watchdog in vt_console_print

2019-09-20 Thread Qiujun Huang
vt_console_print could trigger NMI watchdog in case writing slow: [2858736.789664] NMI watchdog: Watchdog detected hard LOCKUP on cpu 23 ... [2858736.790194] CPU: 23 PID: 32504 Comm: tensorflow_mode Not tainted 4.4.131-1.el7.elrepo.x86_64 #1 [2858736.790206] Hardware name: Huawei RH2288 V3

[Regression]: NMI watchdog regression from v4.19 onwards

2019-03-08 Thread Maxime Coquelin
Hi Peter, Oleg, NMI watchdog fires systematically on my machine with recent Kernels, whereas the NMI watch is supposed to be disabled: # cat /proc/sys/kernel/watchdog 0 # cat /proc/sys/kernel/nmi_watchdog 0 # [ 53.765648] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7 [ 53.765648

Re: NMI watchdog dump does not print on hard lockup

2018-10-22 Thread Sergey Senozhatsky
On (10/16/17 10:15), Steven Rostedt wrote: > On Mon, 16 Oct 2017 22:13:05 +0900 > Sergey Senozhatsky wrote: > > > just "brainstorming" it... with some silly ideas. > > > > pushing the data from NMI panic might look like we are replacing one > > deadlock scenario with another deadlock scenario. s

[PATCH 1/2] liblockdep: Stub NMI watchdog reset

2018-08-28 Thread Ben Hutchings
From: Bastian Blank lockdep.c now includes and requires touch_nmi_watchdog(), so provide those for liblockdep. Fixes: 88f1c87de11a ("locking/lockdep: Avoid triggering hardlockup from ...") [bwh: Write a longer description] Signed-off-by: Ben Hutchings --- tools/include/linux/nmi.h | 12 ++

[PATCH 1/2] liblockdep: Stub NMI watchdog reset

2018-08-28 Thread Ben Hutchings
From: Bastian Blank lockdep.c now includes and requires touch_nmi_watchdog(), so provide those for liblockdep. Fixes: 88f1c87de11a ("locking/lockdep: Avoid triggering hardlockup from ...") [bwh: Write a longer description] Signed-off-by: Ben Hutchings --- tools/include/linux/nmi.h | 12 ++

[PATCH v2 10/11] arch/*: Kconfig: fix documentation for NMI watchdog

2018-05-09 Thread Mauro Carvalho Chehab
Changeset 9919cba7ff71 ("watchdog: Update documentation") updated the documentation, removing the old nmi_watchdog.txt and adding a file with a new content. Update Kconfig files accordingly. Fixes: 9919cba7ff71 ("watchdog: Update documentation") Signed-off-by: Mauro Carvalho Chehab --- arch/ar

[PATCH 3.16 144/294] PM/hibernate: touch NMI watchdog when creating snapshot

2017-11-06 Thread Ben Hutchings
image memory... NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1 task: 9f01971ac000 task.stack: b1a3f325c000 RIP: 0010:memory_bm_find_bit+0xf4/0x100 Call Trace: swsusp_set_page_free+0x2b

[PATCH 3.2 048/147] PM/hibernate: touch NMI watchdog when creating snapshot

2017-11-06 Thread Ben Hutchings
image memory... NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1 task: 9f01971ac000 task.stack: b1a3f325c000 RIP: 0010:memory_bm_find_bit+0xf4/0x100 Call Trace: swsusp_set_page_free+0x2b

Re: NMI watchdog dump does not print on hard lockup

2017-10-17 Thread Sergey Senozhatsky
On (10/16/17 10:15), Steven Rostedt wrote: [..] > > just "brainstorming" it... with some silly ideas. > > > > pushing the data from NMI panic might look like we are replacing one > > deadlock scenario with another deadlock scenario. some of the console > > drivers are s complex internally. so

Re: NMI watchdog dump does not print on hard lockup

2017-10-16 Thread Steven Rostedt
On Mon, 16 Oct 2017 22:13:05 +0900 Sergey Senozhatsky wrote: > just "brainstorming" it... with some silly ideas. > > pushing the data from NMI panic might look like we are replacing one > deadlock scenario with another deadlock scenario. some of the console > drivers are s complex internally

Re: NMI watchdog dump does not print on hard lockup

2017-10-16 Thread Sergey Senozhatsky
Hello, On (10/16/17 13:12), Petr Mladek wrote: [..] > > I think an NMI watchdog should just force the flush - the same way an > > oops should. Deadlocks aren't really relevant if something doesn't get > > printed out anyway. > > We expicititely flush the NM

Re: NMI watchdog dump does not print on hard lockup

2017-10-16 Thread Petr Mladek
sperate to keep going or see something. > I think an NMI watchdog should just force the flush - the same way an > oops should. Deadlocks aren't really relevant if something doesn't get > printed out anyway. We expicititely flush the NMI buffers in panic() when there is not other

Re: NMI watchdog dump does not print on hard lockup

2017-10-13 Thread Linus Torvalds
d, I suspect most people have just rebooted the machine. I think an NMI watchdog should just force the flush - the same way an oops should. Deadlocks aren't really relevant if something doesn't get printed out anyway. Linus

Re: NMI watchdog dump does not print on hard lockup

2017-10-13 Thread Steven Rostedt
On Fri, 13 Oct 2017 13:14:44 +0200 Petr Mladek wrote: > In general, we could either improve detection of situations when > the entire system is locked. It would be a reason to risk calling > consoles even in NMI. > > Or we could accept that the "default" printk is not good for all > situations a

Re: NMI watchdog dump does not print on hard lockup

2017-10-13 Thread Petr Mladek
CPUs were hard locked). > > Finally I did: > > on_each_cpu(lock_up_cpu, NULL, 0); > lock_up_cpu(tr); > > And boom! It locked up (lockdep was enabled, so I could see it showing > the deadlock), but then it stopped there. No output. The NMI watchdog > will only detec

Re: NMI watchdog dump does not print on hard lockup

2017-10-12 Thread Peter Zijlstra
On Thu, Oct 12, 2017 at 12:16:58PM -0400, Steven Rostedt wrote: > We need a way to have NMI flush to consoles when a lockup is detected, > and not depend on an irq_work to do so. Why do you think I never use that crap? early_printk FTW ;-)

NMI watchdog dump does not print on hard lockup

2017-10-12 Thread Steven Rostedt
ard locked). Finally I did: on_each_cpu(lock_up_cpu, NULL, 0); lock_up_cpu(tr); And boom! It locked up (lockdep was enabled, so I could see it showing the deadlock), but then it stopped there. No output. The NMI watchdog will only detect hard lockups if there is at least one CPU that

[PATCH 4.12 45/99] PM/hibernate: touch NMI watchdog when creating snapshot

2017-08-28 Thread Greg Kroah-Hartman
image memory... NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1 task: 9f01971ac000 task.stack: b1a3f325c000 RIP: 0010:memory_bm_find_bit+0xf4/0x100 Call Trace: swsusp_set_page_free+0x2b

RE: [PATCH V2 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-06-12 Thread Liang, Kan
> When you re-send these patches that got reverted earlier, you really > should add yourself to the sign-off list.. and if you were the original > author, you should have been there from the first... > Hmm? Andi was the original author. Sure, I will add my signature after him. Thanks, Kan > On

Re: [PATCH V2 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-06-11 Thread Andi Kleen
On Fri, Jun 09, 2017 at 08:39:59PM -0700, Linus Torvalds wrote: >Not commenting on the patch itself - I'll leave that to others. But the >sign-off chain is buggered. >When you re-send these patches that got reverted earlier, you really >should add yourself to the sign-off list.. and

[PATCH V2 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-06-09 Thread kan . liang
From: Kan Liang The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall

[tip:perf/urgent] perf stat: Only print NMI watchdog hint when enabled

2017-06-07 Thread tip-bot for Andi Kleen
print NMI watchdog hint when enabled Only print the NMI watchdog hint when that watchdog it actually enabled. This avoids printing these unnecessarily. Signed-off-by: Andi Kleen Acked-by: Jiri Olsa Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857w...@git.kernel.org Signed-off-by: Arnaldo

[PATCH 04/11] perf stat: Only print NMI watchdog hint when enabled

2017-06-06 Thread Arnaldo Carvalho de Melo
From: Andi Kleen Only print the NMI watchdog hint when that watchdog it actually enabled. This avoids printing these unnecessarily. Signed-off-by: Andi Kleen Acked-by: Jiri Olsa Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857w...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo

Re: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Andi Kleen
> > > The ref cycles always tick at their frequency, or slower when the system > > is idling. That means the NMI watchdog can never expire too early, > > unlike with cycles. > > > Just make the period longer, like 30% longer. Take the max turbo factor you > can

Re: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Peter Zijlstra
On Mon, May 22, 2017 at 04:58:04PM +, Liang, Kan wrote: > > > > On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote: > > > This patch was once merged, but reverted later. > > > Because ref-cycles can not be used anymore when watchdog is enabled. > > > The commit is 44530d588e1

Re: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Stephane Eranian
Andi, On Fri, May 19, 2017 at 10:06 AM, wrote: > From: Kan Liang > > The NMI watchdog uses either the fixed cycles or a generic cycles > counter. This causes a lot of conflicts with users of the PMU who want > to run a full group including the cycles fixed counter, for example

RE: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Liang, Kan
> On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote: > > This patch was once merged, but reverted later. > > Because ref-cycles can not be used anymore when watchdog is enabled. > > The commit is 44530d588e142a96cf0cd345a7cb8911c4f88720 > > > > The patch 1/2 has extended the ref

Re: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Peter Zijlstra
On Mon, May 22, 2017 at 02:03:21PM +0200, Peter Zijlstra wrote: > On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote: > > This patch was once merged, but reverted later. > > Because ref-cycles can not be used anymore when watchdog is enabled. > > The commit is 44530d588e142a96cf0cd

Re: [PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-22 Thread Peter Zijlstra
On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote: > This patch was once merged, but reverted later. > Because ref-cycles can not be used anymore when watchdog is enabled. > The commit is 44530d588e142a96cf0cd345a7cb8911c4f88720 > > The patch 1/2 has extended the ref-cycles to GP

[PATCH 2/2] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2017-05-19 Thread kan . liang
From: Kan Liang The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall

[rbtree_test_init] ca92e6c7e6 [ 87.454390] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]

2017-04-07 Thread Fengguang Wu
ee testing [ 59.360186] -> 571619 cycles [ 61.437068] augmented rbtree testing [ 87.454390] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1] [ 87.463514] CPU: 0 PID: 1 Comm: swapper Not tainted 4.10.0-rc4-00095-gca92e6c #1 [ 87.466569] Hardware name: QEMU Standard PC (i440FX

Re: [drm] 4e64e5539d [ 1138.272031] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:1]

2017-03-29 Thread Gabriel Krisman Bertazi
Fengguang Wu writes: > Hi Chris, > >>+--+++---+---+ >>| | 17aad8a340 | 4e64e5539d | >>v4.11-rc3 | next-20170320 | >>+--+---

[KASAN, ACPI] 80a9201a59 [ 56.736479] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2017-03-28 Thread Fengguang Wu
1354] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 21.262748] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 [ 32.256180] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 [ 56.736479] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 56.738452] Modules

[acpi] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [swapper/0:1]

2017-03-12 Thread Fengguang Wu
[LNKD] enabled at IRQ 11 [ 22.614543] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 31.724849] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 [ 40.140941] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 [ 65.545954] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [swapper/0:1] [

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-06 Thread Meelis Roos
> > > > fine, > > > > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 > > > > > > > > exhibit a > > > > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one > > > >

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-05 Thread Frederic Weisbecker
> > > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit > > > > > > > a > > > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of > > > > > > > the > > &g

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-05 Thread Meelis Roos
Added some CC-s because of bisect find. Whole context should be still here. > > > > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > >

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-03 Thread Thomas Gleixner
292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of the > > > > > CPUs in LOCKUP. The system keeps running fine. The first lockup was > > > > > different, all the others were from

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-03 Thread Meelis Roos
> 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of > > > > > > the > > > > > > CPUs in LOCKUP. The system keeps running fine. The first lockup

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-02 Thread Meelis Roos
> > > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of the > > >

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-02 Thread Thomas Gleixner
On Wed, 1 Mar 2017, Thomas Gleixner wrote: > On Thu, 2 Mar 2017, Meelis Roos wrote: > > > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > > probl

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-01 Thread Meelis Roos
> > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > problem. Ocassionally NMI watchdog kicks in and discovers one of the > > CPUs in LOCKUP. The system keeps running fi

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-01 Thread Thomas Gleixner
On Thu, 2 Mar 2017, Meelis Roos wrote: > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > > problem. Ocassionally NMI watchdog kicks in and discovers one of the >

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-01 Thread Meelis Roos
> > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > > problem. Ocassionally NMI watchdog kicks in and discovers one of the > > CPUs in LOCKUP. The system keeps running fi

Re: PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-01 Thread Thomas Gleixner
On Wed, 1 Mar 2017, Meelis Roos wrote: > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a > problem. Ocassionally NMI watchdog kicks in and discovers one of the > CPUs in LOCKUP. The system k

PPro arch_cpu_idle: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

2017-03-01 Thread Meelis Roos
This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine, 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a problem. Ocassionally NMI watchdog kicks in and discovers one of the CPUs in LOCKUP. The system keeps running fine. The first lockup was different, all the

Re: [tip:perf/core] Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86"

2016-07-10 Thread Andi Kleen
> Committer: Ingo Molnar > CommitDate: Sun, 10 Jul 2016 20:58:36 +0200 > > Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" > > This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a. > > Stephane reported the following regressi

[tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

2016-06-14 Thread tip-bot for Andi Kleen
: Switch NMI watchdog to ref cycles on x86 The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The

Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-10 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu: > From: Andi Kleen > > Now that the NMI watchdog runs with reference cycles, and does not > conflict with TopDown anymore, we don't need to check that the > NMI watchdog is off in perf stat --topdown. > >

Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-09 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 09, 2016 at 08:17:16AM -0700, Andi Kleen escreveu: > On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu: > > > Now that the NMI watchdog runs with reference cycles, and does not

Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-09 Thread Andi Kleen
On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu: > > From: Andi Kleen > > > > Now that the NMI watchdog runs with reference cycles, and does not > > Now as in when? We should

Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-09 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu: > From: Andi Kleen > > Now that the NMI watchdog runs with reference cycles, and does not Now as in when? We should at least warn the user that the kernel used is one where the NMI watchdog will not get in the way of to

[PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-09 Thread Andi Kleen
From: Andi Kleen Now that the NMI watchdog runs with reference cycles, and does not conflict with TopDown anymore, we don't need to check that the NMI watchdog is off in perf stat --topdown. Remove the code that does this and always use a group unconditionally. Signed-off-by: Andi

[PATCH 1/2] Switch NMI watchdog to ref cycles on x86

2016-06-09 Thread Andi Kleen
From: Andi Kleen The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall

Re: [PATCH 1/2] Switch NMI watchdog to ref cycles on x86

2016-06-09 Thread Andi Kleen
On Thu, Jun 09, 2016 at 10:43:14AM +0200, Peter Zijlstra wrote: > On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote: > > > This patch switches the NMI watchdog to use reference cycles > > Are you sure; it seems to only add an #include Right, sorry a git rebase went wr

Re: [PATCH 1/2] Switch NMI watchdog to ref cycles on x86

2016-06-09 Thread Peter Zijlstra
On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote: > This patch switches the NMI watchdog to use reference cycles Are you sure; it seems to only add an #include > --- > arch/x86/kernel/apic/hw_nmi.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/ar

[PATCH 2/2] perf stat: Remove nmi watchdog check code again

2016-06-08 Thread Andi Kleen
From: Andi Kleen Now that the NMI watchdog runs with reference cycles, and does not conflict with TopDown anymore, we don't need to check that the NMI watchdog is off in perf stat --topdown. Remove the code that does this and always use a group unconditionally. Signed-off-by: Andi

[PATCH 1/2] Switch NMI watchdog to ref cycles on x86

2016-06-08 Thread Andi Kleen
From: Andi Kleen The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall

BUG: NMI Watchdog detected LOCKUP on CPU19, ip ffffffff814c5aee, registers:

2016-03-02 Thread zhiqiang dang
7981: cpu0 unhandled wrmsr: 0x198 data 0 kvm: 17981: cpu1 unhandled wrmsr: 0x198 data 0 BUG: NMI Watchdog detected LOCKUP on CPU19, ip 814c5aee, registers: CPU 19 Modules linked in: act_police cls_u32 sch_ingress sch_htb ip6_tables iptable_filter ip_tables ebtable_nat ebtables stp llc openvswi

quad Opteron nmi watchdog soft lockup problems 4.2, 4,3, 4.4.1

2016-02-20 Thread Jurriaan
+0x97/0x1b0 [ 2069.620667] [] ? do_execveat_common.isra.33+0x540/0x700 [ 2069.620670] [] ? SyS_execve+0x35/0x40 [ 2069.620673] [] ? stub_execve+0x5/0x5 [ 2069.620676] [] ? entry_SYSCALL_64_fastpath+0x16/0x71 [ 2073.216504] NMI watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [doit.sh:37356] [ 20

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Andrey Ryabinin
2015-11-02 23:07 GMT+03:00 Dave Hansen : > On 11/02/2015 11:34 AM, Andrey Ryabinin wrote: >> >> [1.159450] augmented rbtree testing -> 23675 cycles >> [1.864996] >> It took less than a second, meanwhile in your case it didn't finish in >> 22 seconds. >> >>

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Dave Hansen
On 11/02/2015 11:34 AM, Andrey Ryabinin wrote: >>> >> >>> >> [1.159450] augmented rbtree testing -> 23675 cycles >>> >> [1.864996] >>> >> It took less than a second, meanwhile in your case it didn't finish in >>> >> 22 seconds. >>> >> >>> >> This makes me think that your host is overloaded

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Andrey Ryabinin
2015-11-02 20:39 GMT+03:00 Dave Hansen : > On 11/02/2015 01:32 AM, Andrey Ryabinin wrote: >> And the major factor here is number 2. >> >> In your dmesg: >> [ 67.891156] rbtree testing -> 570841 cycles >> [ 88.609636] augmented rbtree testing >> [

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Dave Hansen
On 11/02/2015 01:32 AM, Andrey Ryabinin wrote: > And the major factor here is number 2. > > In your dmesg: > [ 67.891156] rbtree testing -> 570841 cycles > [ 88.609636] augmented rbtree testing > [ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! > [s

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Andrey Ryabinin
NFIG_KASAN=y) 2. I suspect that your host is overloaded, thus KVM guest runs too slow. And the major factor here is number 2. In your dmesg: [ 67.891156] rbtree testing -> 570841 cycles [ 88.609636] augmented rbtree testing [ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swap

Re: [lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread Dmitry Vyukov
> commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd ("fs: dcache: manually > unpoison dname after allocation to shut up kasan's reports") > > > The commit fixed a KASan bug, but introduced or revealed a soft lockup > issue as follow. > > > [ 116.546697] N

[lkp] [fs] df4c0e36f1: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

2015-11-02 Thread kernel test robot
t introduced or revealed a soft lockup issue as follow. [ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 116.546697] irq event stamp: 3018750 [ 116.546697] hardirqs last enabled at (3018749): [] restore_args+0x0/0x30 [ 116.546697] hardirqs last disab

Re: [PATCH 0/3] KVM: x86: legacy NMI watchdog fixes

2015-07-01 Thread Paolo Bonzini
On 30/06/2015 22:19, Radim Krčmář wrote: > Until v2.6.37, Linux used NMI watchdog that utilized IO-APIC and LVT0. > This series fixes some problems with APICv, restore, and concurrency > while keeping the monster asleep. Queued for 4.2. Paolo -- To unsubscribe from this list: send

[PATCH 0/3] KVM: x86: legacy NMI watchdog fixes

2015-06-30 Thread Radim Krčmář
Until v2.6.37, Linux used NMI watchdog that utilized IO-APIC and LVT0. This series fixes some problems with APICv, restore, and concurrency while keeping the monster asleep. Radim Krčmář (3): KVM: x86: keep track of LVT0 changes under APICv KVM: x86: properly restore LVT0 KVM: x86: make

[PATCH 3.13.y-ckt 145/156] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-04-07 Thread Kamal Mostafa
., arch_trigger_all_cpu_backtrace) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Kamal Mostafa --- arch/sparc/kernel

[PATCH 3.12 017/155] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-04-07 Thread Jiri Slaby
) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Jiri Slaby --- arch/sparc/kernel/process_64.c | 4

Re: NMI watchdog

2015-03-30 Thread Michal Hocko
es, when I return to my computer from being away for a > little while, I noticed: > Message from syslogd@redacted at Mar 23 XX:XX:XX ... > kernel:[1059322.470817] NMI watchdog: BUG: soft lockup - CPU#1 stuck > for 22s! [kswapd0:31] traces dumped as a part of the watchdog output is th

NMI watchdog

2015-03-30 Thread Justin Keller
ticed: Message from syslogd@redacted at Mar 23 XX:XX:XX ... kernel:[1059322.470817] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kswapd0:31] Dmesg | grep NMI produced: [1151200.727734] sending NMI to all CPUs: [1151200.727812] NMI backtrace for cpu 0 [1151200.764129] INFO: NMI ha

[PATCH 3.16.y-ckt 154/165] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-03-25 Thread Luis Henriques
., arch_trigger_all_cpu_backtrace) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Luis Henriques --- arch/sparc/kernel

[PATCH 3.19 004/123] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-03-24 Thread Greg Kroah-Hartman
., arch_trigger_all_cpu_backtrace) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel

[PATCH 3.14 06/79] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-03-24 Thread Greg Kroah-Hartman
., arch_trigger_all_cpu_backtrace) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel

[PATCH 3.10 05/55] sparc: Touch NMI watchdog when walking cpus and calling printk

2015-03-24 Thread Greg Kroah-Hartman
., arch_trigger_all_cpu_backtrace) can take a long time to complete. If IRQs are disabled eventually the NMI watchdog kicks in and creates more havoc. Avoid by telling the NMI watchdog everything is ok. Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel

Re: NMI watchdog triggering during load_balance

2015-03-09 Thread David Ahern
On 3/6/15 12:29 PM, Mike Galbraith wrote: On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote: But, I do not understand how the wrong topology is causing the NMI watchdog to trigger. In the end there are still N domains, M groups per domain and P cpus per group. Doesn't the balancing

Re: NMI watchdog triggering during load_balance

2015-03-07 Thread Peter Zijlstra
not understand how the wrong topology is causing the NMI watchdog > to trigger. In the end there are still N domains, M groups per domain and P > cpus per group. Doesn't the balancing walk over all of them irrespective of > physical topology? Not quite; so for regular load balancing onl

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread Mike Galbraith
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote: > But, I do not understand how the wrong topology is causing the NMI > watchdog to trigger. In the end there are still N domains, M groups per > domain and P cpus per group. Doesn't the balancing walk over all of them >

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread David Ahern
g it. -- But, I do not understand how the wrong topology is causing the NMI watchdog to trigger. In the end there are still N domains, M groups per domain and P cpus per group. Doesn't the balancing walk over all of them irrespective of physical topology? Here's another data point that

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread Mike Galbraith
On Fri, 2015-03-06 at 08:01 -0700, David Ahern wrote: > On 3/5/15 9:52 PM, Mike Galbraith wrote: > >> CPU970 attaching sched-domain: > >>domain 0: span 968-975 level SIBLING > >> groups: 8 single CPU groups > >> domain 1: span 968-975 level MC > >> groups: 1 group with 8 cpus > >>

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread David Ahern
On 3/6/15 2:12 AM, Peter Zijlstra wrote: On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: Socket(s): 32 NUMA node(s): 4 Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for a total of 256 cpus per node. Per the response to Mike, the system

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread David Ahern
On 3/6/15 2:07 AM, Peter Zijlstra wrote: On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: Since each domain is a superset of the lower one each pass through load_balance regularly repeats the processing of the previous domain (e.g., NODE domain repeats the cpus in the CPU domain). Th

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread David Ahern
On 3/6/15 1:51 AM, Peter Zijlstra wrote: On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: Hi Peter/Mike/Ingo: Does that make sense or am I off in the weeds? How much of your story pertains to 3.18? I'm not particularly interested in anything much older than that. No. All of the

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread David Ahern
On 3/5/15 9:52 PM, Mike Galbraith wrote: CPU970 attaching sched-domain: domain 0: span 968-975 level SIBLING groups: 8 single CPU groups domain 1: span 968-975 level MC groups: 1 group with 8 cpus domain 2: span 768-1023 level CPU groups: 4 groups with 256 cpus per grou

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread Peter Zijlstra
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: > Socket(s): 32 > NUMA node(s): 4 Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for a total of 256 cpus per node. That's painful. I don't suppose you can really change the hardware, but that's

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread Peter Zijlstra
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: > Since each domain is a superset of the lower one each pass through > load_balance regularly repeats the processing of the previous domain (e.g., > NODE domain repeats the cpus in the CPU domain). Then multiplying that > across 1024 cpus

Re: NMI watchdog triggering during load_balance

2015-03-06 Thread Peter Zijlstra
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote: > Hi Peter/Mike/Ingo: > > Does that make sense or am I off in the weeds? How much of your story pertains to 3.18? I'm not particularly interested in anything much older than that. -- To unsubscribe from this list: send the line "unsubsc

Re: NMI watchdog triggering during load_balance

2015-03-05 Thread Mike Galbraith
se cases (e.g., > http://www.cs.virginia.edu/stream/) that regularly trigger the NMI > watchdog with the stack trace: > > Call Trace: > @ [0045d3d0] double_rq_lock+0x4c/0x68 > @ [004699c4] load_balance+0x278/0x740 > @ [008a7b88] __schedule+0x378/0x8e

NMI watchdog triggering during load_balance

2015-03-05 Thread David Ahern
Hi Peter/Mike/Ingo: I've been banging my against this wall for a week now and hoping you or someone could shed some light on the problem. On larger systems (256 to 1024 cpus) there are several use cases (e.g., http://www.cs.virginia.edu/stream/) that regularly trigger the NMI watchdog

kernel:[ 1155.839866] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:5:71]

2014-09-28 Thread Meelis Roos
While trying yesterdays 3.17 git snapshot on a i5-2400 + intel graphics computer, it seems to go into soft lockup. Still pings but no ssh to it any more, and active ssh session got the following kind of NMI watchdog soft lockups during aptitude list update: essage from syslogd@ilves at Sep 27

Re: linux-3.7.1: disable and reenable NMI watchdog => panic

2013-01-11 Thread Borislav Petkov
On Sun, Jan 06, 2013 at 11:53:41PM +0600, Alexander E. Patrakov wrote: > As illustrated by the screen photo [1] (sorry for bad quality), > disabling and then reenabling the NMI watchdog panics the kernel > (3.7.1). This is reproducible in qemu-kvm as well, but I could not get > th

[tip:core/urgent] Documentation: Reflect the new location of the NMI watchdog info

2012-10-21 Thread tip-bot for Jean Delvare
new location of the NMI watchdog info Commit 9919cba7 ("watchdog: Update documentation") moved the NMI watchdog documentation from nmi_watchdog.txt to lockup-watchdogs.txt. Update the index file accordingly. Signed-off-by: Jean Delvare Cc: Fernando Luis Vazquez Cao Cc: Randy Dunla

[Regression, post-3.5][PATCH] Revert "NMI watchdog: fix for lockup detector breakage on resume"

2012-08-04 Thread Rafael J. Wysocki
Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage on resume) which breaks resume from system suspend on my SH7372 Mackerel board (by causing a NULL pointer dereference to happen) and is generally wrong, because it abuses CPU hotplug notifications in a shamelessly blatant way

Re: [BUG] NMI watchdog alert with Linux 2.6.23.16

2008-02-25 Thread Andrew Morton
OTECTED] > end_request: I/O error, dev fd0, sector 0 > end_request: I/O error, dev fd0, sector 0 > kjournald starting. Commit interval 5 seconds > EXT3-fs: mounted filesystem with ordered data mode. > Real Time Clock Driver v1.12ac > > Fedora release 8 (Werewolf) > Kernel 2.

[BUG] NMI watchdog alert with Linux 2.6.23.16

2008-02-24 Thread Chris Rankin
equest: I/O error, dev fd0, sector 0 end_request: I/O error, dev fd0, sector 0 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Real Time Clock Driver v1.12ac Fedora release 8 (Werewolf) Kernel 2.6.23.16 on an i686 volcano.underworld login: BUG: NM

Re: [BUG] 2.6.24 refuses to boot - NMI watchdog problem?

2008-02-08 Thread Chris Rankin
ng PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report BUG: NMI Watchdog detected LOCKUP on CPU0, eip c0102b07, registers: Modules linked in: Pid: 0, comm: swapper Not tainted (2.6.24.1 #1) EIP: 0060:[] EFLAGS: 0246 CPU: 0 EIP is at default_idle+0x2f/0x

[x86_64] Cannot get the NMI watchdog to work

2008-02-08 Thread Maarten Maathuis
t any reference of hpet in my dmesg. - oprofile module is not loaded. - My interrupts seem to have IO-APIC-* mode. - My chipset is a nvidia CK804. Can anyone shed some light on what is needed to get a working NMI watchdog? Sincerely, Maarten Maathuis. -- To unsubscribe from this list: send the

  1   2   3   4   >