On Fri, Sep 20, 2019 at 04:57:26PM +0800, Qiujun Huang wrote:
> vt_console_print could trigger NMI watchdog in case writing slow:
>
> [2858736.789664] NMI watchdog: Watchdog detected hard LOCKUP on cpu 23
> ...
> [2858736.790194] CPU: 23 PID: 32504 Comm: tensorflow_mode Not taint
vt_console_print could trigger NMI watchdog in case writing slow:
[2858736.789664] NMI watchdog: Watchdog detected hard LOCKUP on cpu 23
...
[2858736.790194] CPU: 23 PID: 32504 Comm: tensorflow_mode Not tainted
4.4.131-1.el7.elrepo.x86_64 #1
[2858736.790206] Hardware name: Huawei RH2288 V3
Hi Peter, Oleg,
NMI watchdog fires systematically on my machine with recent Kernels,
whereas the NMI watch is supposed to be disabled:
# cat /proc/sys/kernel/watchdog
0
# cat /proc/sys/kernel/nmi_watchdog
0
#
[ 53.765648] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[ 53.765648
On (10/16/17 10:15), Steven Rostedt wrote:
> On Mon, 16 Oct 2017 22:13:05 +0900
> Sergey Senozhatsky wrote:
>
> > just "brainstorming" it... with some silly ideas.
> >
> > pushing the data from NMI panic might look like we are replacing one
> > deadlock scenario with another deadlock scenario. s
From: Bastian Blank
lockdep.c now includes and requires touch_nmi_watchdog(),
so provide those for liblockdep.
Fixes: 88f1c87de11a ("locking/lockdep: Avoid triggering hardlockup from ...")
[bwh: Write a longer description]
Signed-off-by: Ben Hutchings
---
tools/include/linux/nmi.h | 12 ++
From: Bastian Blank
lockdep.c now includes and requires touch_nmi_watchdog(),
so provide those for liblockdep.
Fixes: 88f1c87de11a ("locking/lockdep: Avoid triggering hardlockup from ...")
[bwh: Write a longer description]
Signed-off-by: Ben Hutchings
---
tools/include/linux/nmi.h | 12 ++
Changeset 9919cba7ff71 ("watchdog: Update documentation") updated
the documentation, removing the old nmi_watchdog.txt and adding
a file with a new content.
Update Kconfig files accordingly.
Fixes: 9919cba7ff71 ("watchdog: Update documentation")
Signed-off-by: Mauro Carvalho Chehab
---
arch/ar
image memory...
NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted
4.13.0-0.rc2.git0.1.fc27.x86_64 #1
task: 9f01971ac000 task.stack: b1a3f325c000
RIP: 0010:memory_bm_find_bit+0xf4/0x100
Call Trace:
swsusp_set_page_free+0x2b
image memory...
NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted
4.13.0-0.rc2.git0.1.fc27.x86_64 #1
task: 9f01971ac000 task.stack: b1a3f325c000
RIP: 0010:memory_bm_find_bit+0xf4/0x100
Call Trace:
swsusp_set_page_free+0x2b
On (10/16/17 10:15), Steven Rostedt wrote:
[..]
> > just "brainstorming" it... with some silly ideas.
> >
> > pushing the data from NMI panic might look like we are replacing one
> > deadlock scenario with another deadlock scenario. some of the console
> > drivers are s complex internally. so
On Mon, 16 Oct 2017 22:13:05 +0900
Sergey Senozhatsky wrote:
> just "brainstorming" it... with some silly ideas.
>
> pushing the data from NMI panic might look like we are replacing one
> deadlock scenario with another deadlock scenario. some of the console
> drivers are s complex internally
Hello,
On (10/16/17 13:12), Petr Mladek wrote:
[..]
> > I think an NMI watchdog should just force the flush - the same way an
> > oops should. Deadlocks aren't really relevant if something doesn't get
> > printed out anyway.
>
> We expicititely flush the NM
sperate to keep going or see something.
> I think an NMI watchdog should just force the flush - the same way an
> oops should. Deadlocks aren't really relevant if something doesn't get
> printed out anyway.
We expicititely flush the NMI buffers in panic() when there is
not other
d, I suspect most people have just
rebooted the machine.
I think an NMI watchdog should just force the flush - the same way an
oops should. Deadlocks aren't really relevant if something doesn't get
printed out anyway.
Linus
On Fri, 13 Oct 2017 13:14:44 +0200
Petr Mladek wrote:
> In general, we could either improve detection of situations when
> the entire system is locked. It would be a reason to risk calling
> consoles even in NMI.
>
> Or we could accept that the "default" printk is not good for all
> situations a
CPUs were hard locked).
>
> Finally I did:
>
> on_each_cpu(lock_up_cpu, NULL, 0);
> lock_up_cpu(tr);
>
> And boom! It locked up (lockdep was enabled, so I could see it showing
> the deadlock), but then it stopped there. No output. The NMI watchdog
> will only detec
On Thu, Oct 12, 2017 at 12:16:58PM -0400, Steven Rostedt wrote:
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.
Why do you think I never use that crap? early_printk FTW ;-)
ard locked).
Finally I did:
on_each_cpu(lock_up_cpu, NULL, 0);
lock_up_cpu(tr);
And boom! It locked up (lockdep was enabled, so I could see it showing
the deadlock), but then it stopped there. No output. The NMI watchdog
will only detect hard lockups if there is at least one CPU that
image memory...
NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted
4.13.0-0.rc2.git0.1.fc27.x86_64 #1
task: 9f01971ac000 task.stack: b1a3f325c000
RIP: 0010:memory_bm_find_bit+0xf4/0x100
Call Trace:
swsusp_set_page_free+0x2b
> When you re-send these patches that got reverted earlier, you really
> should add yourself to the sign-off list.. and if you were the original
> author, you should have been there from the first...
> Hmm?
Andi was the original author.
Sure, I will add my signature after him.
Thanks,
Kan
> On
On Fri, Jun 09, 2017 at 08:39:59PM -0700, Linus Torvalds wrote:
>Not commenting on the patch itself - I'll leave that to others. But the
>sign-off chain is buggered.
>When you re-send these patches that got reverted earlier, you really
>should add yourself to the sign-off list.. and
From: Kan Liang
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example the
--topdown support recently added to perf stat. The code needs to fall
print NMI watchdog hint when enabled
Only print the NMI watchdog hint when that watchdog it actually enabled.
This avoids printing these unnecessarily.
Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857w...@git.kernel.org
Signed-off-by: Arnaldo
From: Andi Kleen
Only print the NMI watchdog hint when that watchdog it actually enabled.
This avoids printing these unnecessarily.
Signed-off-by: Andi Kleen
Acked-by: Jiri Olsa
Link: http://lkml.kernel.org/n/tip-lnw7edxnqsphkmeew857w...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
>
> > The ref cycles always tick at their frequency, or slower when the system
> > is idling. That means the NMI watchdog can never expire too early,
> > unlike with cycles.
> >
> Just make the period longer, like 30% longer. Take the max turbo factor you
> can
On Mon, May 22, 2017 at 04:58:04PM +, Liang, Kan wrote:
>
>
> > On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote:
> > > This patch was once merged, but reverted later.
> > > Because ref-cycles can not be used anymore when watchdog is enabled.
> > > The commit is 44530d588e1
Andi,
On Fri, May 19, 2017 at 10:06 AM, wrote:
> From: Kan Liang
>
> The NMI watchdog uses either the fixed cycles or a generic cycles
> counter. This causes a lot of conflicts with users of the PMU who want
> to run a full group including the cycles fixed counter, for example
> On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote:
> > This patch was once merged, but reverted later.
> > Because ref-cycles can not be used anymore when watchdog is enabled.
> > The commit is 44530d588e142a96cf0cd345a7cb8911c4f88720
> >
> > The patch 1/2 has extended the ref
On Mon, May 22, 2017 at 02:03:21PM +0200, Peter Zijlstra wrote:
> On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote:
> > This patch was once merged, but reverted later.
> > Because ref-cycles can not be used anymore when watchdog is enabled.
> > The commit is 44530d588e142a96cf0cd
On Fri, May 19, 2017 at 10:06:22AM -0700, kan.li...@intel.com wrote:
> This patch was once merged, but reverted later.
> Because ref-cycles can not be used anymore when watchdog is enabled.
> The commit is 44530d588e142a96cf0cd345a7cb8911c4f88720
>
> The patch 1/2 has extended the ref-cycles to GP
From: Kan Liang
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example the
--topdown support recently added to perf stat. The code needs to fall
ee testing
[ 59.360186] -> 571619 cycles
[ 61.437068] augmented rbtree testing
[ 87.454390] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
[ 87.463514] CPU: 0 PID: 1 Comm: swapper Not tainted
4.10.0-rc4-00095-gca92e6c #1
[ 87.466569] Hardware name: QEMU Standard PC (i440FX
Fengguang Wu writes:
> Hi Chris,
>
>>+--+++---+---+
>>| | 17aad8a340 | 4e64e5539d |
>>v4.11-rc3 | next-20170320 |
>>+--+---
1354] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 21.262748] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
[ 32.256180] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 56.736479] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[swapper/0:1]
[ 56.738452] Modules
[LNKD] enabled at IRQ 11
[ 22.614543] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[ 31.724849] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
[ 40.140941] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 65.545954] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 24s!
[swapper/0:1]
[
> > > > fine,
> > > > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514
> > > > > > > > exhibit a
> > > > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one
> > > >
> > > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit
> > > > > > > a
> > > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of
> > > > > > > the
> > &g
Added some CC-s because of bisect find. Whole context should be still
here.
> > > > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > >
292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of the
> > > > > CPUs in LOCKUP. The system keeps running fine. The first lockup was
> > > > > different, all the others were from
> 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of
> > > > > > the
> > > > > > CPUs in LOCKUP. The system keeps running fine. The first lockup
> > > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > > > problem. Ocassionally NMI watchdog kicks in and discovers one of the
> > >
On Wed, 1 Mar 2017, Thomas Gleixner wrote:
> On Thu, 2 Mar 2017, Meelis Roos wrote:
>
> > > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > > probl
> > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > problem. Ocassionally NMI watchdog kicks in and discovers one of the
> > CPUs in LOCKUP. The system keeps running fi
On Thu, 2 Mar 2017, Meelis Roos wrote:
> > > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > > problem. Ocassionally NMI watchdog kicks in and discovers one of the
>
> > This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> > 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> > problem. Ocassionally NMI watchdog kicks in and discovers one of the
> > CPUs in LOCKUP. The system keeps running fi
On Wed, 1 Mar 2017, Meelis Roos wrote:
> This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
> 4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
> problem. Ocassionally NMI watchdog kicks in and discovers one of the
> CPUs in LOCKUP. The system k
This is on my trusty IBM PC365, dual Pentium Pro. 4.10 worked fine,
4.10.0-09686-g9e314890292c and 4.10.0-10770-g2d6be4abf514 exhibit a
problem. Ocassionally NMI watchdog kicks in and discovers one of the
CPUs in LOCKUP. The system keeps running fine. The first lockup was
different, all the
> Committer: Ingo Molnar
> CommitDate: Sun, 10 Jul 2016 20:58:36 +0200
>
> Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86"
>
> This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a.
>
> Stephane reported the following regressi
: Switch NMI watchdog to ref cycles on x86
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The
Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> From: Andi Kleen
>
> Now that the NMI watchdog runs with reference cycles, and does not
> conflict with TopDown anymore, we don't need to check that the
> NMI watchdog is off in perf stat --topdown.
>
>
Em Thu, Jun 09, 2016 at 08:17:16AM -0700, Andi Kleen escreveu:
> On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> > > Now that the NMI watchdog runs with reference cycles, and does not
On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> > From: Andi Kleen
> >
> > Now that the NMI watchdog runs with reference cycles, and does not
>
> Now as in when? We should
Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> From: Andi Kleen
>
> Now that the NMI watchdog runs with reference cycles, and does not
Now as in when? We should at least warn the user that the kernel used is
one where the NMI watchdog will not get in the way of to
From: Andi Kleen
Now that the NMI watchdog runs with reference cycles, and does not
conflict with TopDown anymore, we don't need to check that the
NMI watchdog is off in perf stat --topdown.
Remove the code that does this and always use a group unconditionally.
Signed-off-by: Andi
From: Andi Kleen
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall
On Thu, Jun 09, 2016 at 10:43:14AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote:
>
> > This patch switches the NMI watchdog to use reference cycles
>
> Are you sure; it seems to only add an #include
Right, sorry a git rebase went wr
On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote:
> This patch switches the NMI watchdog to use reference cycles
Are you sure; it seems to only add an #include
> ---
> arch/x86/kernel/apic/hw_nmi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/ar
From: Andi Kleen
Now that the NMI watchdog runs with reference cycles, and does not
conflict with TopDown anymore, we don't need to check that the
NMI watchdog is off in perf stat --topdown.
Remove the code that does this and always use a group unconditionally.
Signed-off-by: Andi
From: Andi Kleen
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall
7981: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 17981: cpu1 unhandled wrmsr: 0x198 data 0
BUG: NMI Watchdog detected LOCKUP on CPU19, ip 814c5aee, registers:
CPU 19
Modules linked in: act_police cls_u32 sch_ingress sch_htb ip6_tables
iptable_filter ip_tables ebtable_nat ebtables stp llc openvswi
+0x97/0x1b0
[ 2069.620667] [] ? do_execveat_common.isra.33+0x540/0x700
[ 2069.620670] [] ? SyS_execve+0x35/0x40
[ 2069.620673] [] ? stub_execve+0x5/0x5
[ 2069.620676] [] ? entry_SYSCALL_64_fastpath+0x16/0x71
[ 2073.216504] NMI watchdog: BUG: soft lockup - CPU#45 stuck for 22s!
[doit.sh:37356]
[ 20
2015-11-02 23:07 GMT+03:00 Dave Hansen :
> On 11/02/2015 11:34 AM, Andrey Ryabinin wrote:
>>
>> [1.159450] augmented rbtree testing -> 23675 cycles
>> [1.864996]
>> It took less than a second, meanwhile in your case it didn't finish in
>> 22 seconds.
>>
>>
On 11/02/2015 11:34 AM, Andrey Ryabinin wrote:
>>> >>
>>> >> [1.159450] augmented rbtree testing -> 23675 cycles
>>> >> [1.864996]
>>> >> It took less than a second, meanwhile in your case it didn't finish in
>>> >> 22 seconds.
>>> >>
>>> >> This makes me think that your host is overloaded
2015-11-02 20:39 GMT+03:00 Dave Hansen :
> On 11/02/2015 01:32 AM, Andrey Ryabinin wrote:
>> And the major factor here is number 2.
>>
>> In your dmesg:
>> [ 67.891156] rbtree testing -> 570841 cycles
>> [ 88.609636] augmented rbtree testing
>> [
On 11/02/2015 01:32 AM, Andrey Ryabinin wrote:
> And the major factor here is number 2.
>
> In your dmesg:
> [ 67.891156] rbtree testing -> 570841 cycles
> [ 88.609636] augmented rbtree testing
> [ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [s
NFIG_KASAN=y)
2. I suspect that your host is overloaded, thus KVM guest runs too slow.
And the major factor here is number 2.
In your dmesg:
[ 67.891156] rbtree testing -> 570841 cycles
[ 88.609636] augmented rbtree testing
[ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[swap
> commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd ("fs: dcache: manually
> unpoison dname after allocation to shut up kasan's reports")
>
>
> The commit fixed a KASan bug, but introduced or revealed a soft lockup
> issue as follow.
>
>
> [ 116.546697] N
t introduced or revealed a soft lockup
issue as follow.
[ 116.546697] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[swapper/0:1]
[ 116.546697] irq event stamp: 3018750
[ 116.546697] hardirqs last enabled at (3018749): []
restore_args+0x0/0x30
[ 116.546697] hardirqs last disab
On 30/06/2015 22:19, Radim Krčmář wrote:
> Until v2.6.37, Linux used NMI watchdog that utilized IO-APIC and LVT0.
> This series fixes some problems with APICv, restore, and concurrency
> while keeping the monster asleep.
Queued for 4.2.
Paolo
--
To unsubscribe from this list: send
Until v2.6.37, Linux used NMI watchdog that utilized IO-APIC and LVT0.
This series fixes some problems with APICv, restore, and concurrency
while keeping the monster asleep.
Radim Krčmář (3):
KVM: x86: keep track of LVT0 changes under APICv
KVM: x86: properly restore LVT0
KVM: x86: make
., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Kamal Mostafa
---
arch/sparc/kernel
) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Jiri Slaby
---
arch/sparc/kernel/process_64.c | 4
es, when I return to my computer from being away for a
> little while, I noticed:
> Message from syslogd@redacted at Mar 23 XX:XX:XX ...
> kernel:[1059322.470817] NMI watchdog: BUG: soft lockup - CPU#1 stuck
> for 22s! [kswapd0:31]
traces dumped as a part of the watchdog output is th
ticed:
Message from syslogd@redacted at Mar 23 XX:XX:XX ...
kernel:[1059322.470817] NMI watchdog: BUG: soft lockup - CPU#1 stuck
for 22s! [kswapd0:31]
Dmesg | grep NMI produced:
[1151200.727734] sending NMI to all CPUs:
[1151200.727812] NMI backtrace for cpu 0
[1151200.764129] INFO: NMI ha
., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Luis Henriques
---
arch/sparc/kernel
., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
---
arch/sparc/kernel
., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
---
arch/sparc/kernel
., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
---
arch/sparc/kernel
On 3/6/15 12:29 PM, Mike Galbraith wrote:
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
But, I do not understand how the wrong topology is causing the NMI
watchdog to trigger. In the end there are still N domains, M groups per
domain and P cpus per group. Doesn't the balancing
not understand how the wrong topology is causing the NMI watchdog
> to trigger. In the end there are still N domains, M groups per domain and P
> cpus per group. Doesn't the balancing walk over all of them irrespective of
> physical topology?
Not quite; so for regular load balancing onl
On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote:
> But, I do not understand how the wrong topology is causing the NMI
> watchdog to trigger. In the end there are still N domains, M groups per
> domain and P cpus per group. Doesn't the balancing walk over all of them
>
g it.
--
But, I do not understand how the wrong topology is causing the NMI
watchdog to trigger. In the end there are still N domains, M groups per
domain and P cpus per group. Doesn't the balancing walk over all of them
irrespective of physical topology?
Here's another data point that
On Fri, 2015-03-06 at 08:01 -0700, David Ahern wrote:
> On 3/5/15 9:52 PM, Mike Galbraith wrote:
> >> CPU970 attaching sched-domain:
> >>domain 0: span 968-975 level SIBLING
> >> groups: 8 single CPU groups
> >> domain 1: span 968-975 level MC
> >> groups: 1 group with 8 cpus
> >>
On 3/6/15 2:12 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Socket(s): 32
NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
Per the response to Mike, the system
On 3/6/15 2:07 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Since each domain is a superset of the lower one each pass through
load_balance regularly repeats the processing of the previous domain (e.g.,
NODE domain repeats the cpus in the CPU domain). Th
On 3/6/15 1:51 AM, Peter Zijlstra wrote:
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
Hi Peter/Mike/Ingo:
Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
No. All of the
On 3/5/15 9:52 PM, Mike Galbraith wrote:
CPU970 attaching sched-domain:
domain 0: span 968-975 level SIBLING
groups: 8 single CPU groups
domain 1: span 968-975 level MC
groups: 1 group with 8 cpus
domain 2: span 768-1023 level CPU
groups: 4 groups with 256 cpus per grou
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Socket(s): 32
> NUMA node(s): 4
Urgh, with 32 'cpus' per socket, you still do _8_ sockets per node, for
a total of 256 cpus per node.
That's painful. I don't suppose you can really change the hardware, but
that's
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Since each domain is a superset of the lower one each pass through
> load_balance regularly repeats the processing of the previous domain (e.g.,
> NODE domain repeats the cpus in the CPU domain). Then multiplying that
> across 1024 cpus
On Thu, Mar 05, 2015 at 09:05:28PM -0700, David Ahern wrote:
> Hi Peter/Mike/Ingo:
>
> Does that make sense or am I off in the weeds?
How much of your story pertains to 3.18? I'm not particularly interested
in anything much older than that.
--
To unsubscribe from this list: send the line "unsubsc
se cases (e.g.,
> http://www.cs.virginia.edu/stream/) that regularly trigger the NMI
> watchdog with the stack trace:
>
> Call Trace:
> @ [0045d3d0] double_rq_lock+0x4c/0x68
> @ [004699c4] load_balance+0x278/0x740
> @ [008a7b88] __schedule+0x378/0x8e
Hi Peter/Mike/Ingo:
I've been banging my against this wall for a week now and hoping you or
someone could shed some light on the problem.
On larger systems (256 to 1024 cpus) there are several use cases (e.g.,
http://www.cs.virginia.edu/stream/) that regularly trigger the NMI
watchdog
While trying yesterdays 3.17 git snapshot on a i5-2400 + intel graphics
computer, it seems to go into soft lockup. Still pings but no ssh to it
any more, and active ssh session got the following kind of NMI watchdog
soft lockups during aptitude list update:
essage from syslogd@ilves at Sep 27
On Sun, Jan 06, 2013 at 11:53:41PM +0600, Alexander E. Patrakov wrote:
> As illustrated by the screen photo [1] (sorry for bad quality),
> disabling and then reenabling the NMI watchdog panics the kernel
> (3.7.1). This is reproducible in qemu-kvm as well, but I could not get
> th
new location of the NMI watchdog info
Commit 9919cba7 ("watchdog: Update documentation") moved the
NMI watchdog documentation from nmi_watchdog.txt to
lockup-watchdogs.txt. Update the index file accordingly.
Signed-off-by: Jean Delvare
Cc: Fernando Luis Vazquez Cao
Cc: Randy Dunla
Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage
on resume) which breaks resume from system suspend on my SH7372
Mackerel board (by causing a NULL pointer dereference to happen) and
is generally wrong, because it abuses CPU hotplug notifications in a
shamelessly blatant way
OTECTED]
> end_request: I/O error, dev fd0, sector 0
> end_request: I/O error, dev fd0, sector 0
> kjournald starting. Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
> Real Time Clock Driver v1.12ac
>
> Fedora release 8 (Werewolf)
> Kernel 2.
equest: I/O error, dev fd0, sector 0
end_request: I/O error, dev fd0, sector 0
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
Fedora release 8 (Werewolf)
Kernel 2.6.23.16 on an i686
volcano.underworld login: BUG: NM
ng
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
BUG: NMI Watchdog detected LOCKUP on CPU0, eip c0102b07, registers:
Modules linked in:
Pid: 0, comm: swapper Not tainted (2.6.24.1 #1)
EIP: 0060:[] EFLAGS: 0246 CPU: 0
EIP is at default_idle+0x2f/0x
t any reference of hpet in my dmesg.
- oprofile module is not loaded.
- My interrupts seem to have IO-APIC-* mode.
- My chipset is a nvidia CK804.
Can anyone shed some light on what is needed to get a working NMI watchdog?
Sincerely,
Maarten Maathuis.
--
To unsubscribe from this list: send the
1 - 100 of 358 matches
Mail list logo