On Mon, Oct 22, 2018 at 06:16:13PM +0200, Jiri Olsa wrote:
> On Mon, Oct 22, 2018 at 10:07:38AM -0400, Don Zickus wrote:
> > (adding Jiri)
> >
> > On Fri, Oct 19, 2018 at 09:44:01PM -0700, David Miller wrote:
> > > From: David Miller
> > > Dat
> >header->misc.
> >
> > 2) Use this to elide the map groups clone in
> >thread__clone_map_groups().
>
> Looking into code history, I notice:
>
> commit 363b785f3805a2632eb09a8b430842461c21a640
> Author: Don Zickus
> Date: Fri Mar 14 10:43:4
ile or a directory.
>
> The behaviors are now:
>
> --mpath Read only the specific file as file
> --mpath Read all files in as
> files
> --mpath --find-maintainer-files
> Recurse through and read all files named
> MAINTAINERS
On Mon, Jul 30, 2018 at 12:43:34PM -0700, Sinan Kaya wrote:
> Hi Don,
>
> On 7/30/2018 12:28 PM, Don Zickus wrote:
> > > [0.152492] NMI watchdog: Perf event create on CPU 0 failed with -2
> > > [0.156002] NMI watchdog: Perf NMI watchdog permanently disabled
On Mon, Jul 30, 2018 at 12:09:47PM -0700, Sinan Kaya wrote:
> Reducing the verbosity level to debug for people that are interested in
> debugging watchdog issues.
>
> [0.152492] NMI watchdog: Perf event create on CPU 0 failed with -2
> [0.156002] NMI watchdog: Perf NMI watchdog permanently
On Mon, Jul 16, 2018 at 05:20:19PM -0400, Don Zickus wrote:
> On Fri, Jul 13, 2018 at 05:11:58PM -0700, Joe Perches wrote:
> > On Fri, 2018-07-13 at 14:51 -0400, Don Zickus wrote:
> > > On Fri, Jul 06, 2018 at 03:14:28PM -0700, Joe Perches wrote:
> > > > On Fri,
On Fri, Jul 13, 2018 at 05:11:58PM -0700, Joe Perches wrote:
> On Fri, 2018-07-13 at 14:51 -0400, Don Zickus wrote:
> > On Fri, Jul 06, 2018 at 03:14:28PM -0700, Joe Perches wrote:
> > > On Fri, 2018-07-06 at 15:09 -0700, Joe Perches wrote:
> > > > On Fri, 2018-07
On Fri, Jul 06, 2018 at 03:14:28PM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 15:09 -0700, Joe Perches wrote:
> > On Fri, 2018-07-06 at 17:58 -0400, Don Zickus wrote:
> > > We have an internal use case of multiple MAINTAINER files, some folks have
> > > more right
On Fri, Jul 06, 2018 at 03:14:28PM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 15:09 -0700, Joe Perches wrote:
> > On Fri, 2018-07-06 at 17:58 -0400, Don Zickus wrote:
> > > We have an internal use case of multiple MAINTAINER files, some folks have
> > > more right
On Fri, Jul 06, 2018 at 03:14:28PM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 15:09 -0700, Joe Perches wrote:
> > On Fri, 2018-07-06 at 17:58 -0400, Don Zickus wrote:
> > > We have an internal use case of multiple MAINTAINER files, some folks have
> > > more right
On Fri, Jul 06, 2018 at 03:09:17PM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 17:58 -0400, Don Zickus wrote:
> > On Fri, Jul 06, 2018 at 02:36:28PM -0700, Joe Perches wrote:
> > > > > > > Just trying to find ways to minimize our collection of
On Fri, Jul 06, 2018 at 02:36:28PM -0700, Joe Perches wrote:
> > > > > Just trying to find ways to minimize our collection of private
> > > > > patches.
> > > >
> > > > Perhaps that could be extended for your purpose
> > > > with some additional argument like a specific
> > > > optional directory
On Fri, Jul 06, 2018 at 11:31:13AM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 13:54 -0400, Don Zickus wrote:
> > On Tue, Jun 26, 2018 at 01:16:11PM -0700, Joe Perches wrote:
> > > On Tue, 2018-06-26 at 14:25 -0400, Prarit Bhargava wrote:
> > > > OSes have addi
On Fri, Jul 06, 2018 at 11:31:13AM -0700, Joe Perches wrote:
> On Fri, 2018-07-06 at 13:54 -0400, Don Zickus wrote:
> > On Tue, Jun 26, 2018 at 01:16:11PM -0700, Joe Perches wrote:
> > > On Tue, 2018-06-26 at 14:25 -0400, Prarit Bhargava wrote:
> > > > OSes have addi
On Tue, Jun 26, 2018 at 01:16:11PM -0700, Joe Perches wrote:
> On Tue, 2018-06-26 at 14:25 -0400, Prarit Bhargava wrote:
> > OSes have additional maintainers that should be cc'd on patches or may
> > want to circulate internal patches.
> >
> > Parse the .get_maintainer.MAINTAINERS file. Entries i
corresponding entry in Documentation/admin-guide/kernel-parameters.txt.
Acked-by: Don Zickus
>
> Signed-off-by: Scott Wood
> ---
> Documentation/admin-guide/kernel-parameters.txt | 3 +++
> Documentation/sysctl/kernel.txt | 14 ++
> 2 files cha
Commit-ID: 42f930da7f00c0ab23df4c7aed36137f35988980
Gitweb: https://git.kernel.org/tip/42f930da7f00c0ab23df4c7aed36137f35988980
Author: Don Zickus
AuthorDate: Wed, 1 Nov 2017 14:11:27 -0400
Committer: Thomas Gleixner
CommitDate: Wed, 1 Nov 2017 21:18:40 +0100
watchdog/hardlockup/perf
Commit-ID: c7254c8aabe3025770fdb6f2d84aded11716ca2b
Gitweb: https://git.kernel.org/tip/c7254c8aabe3025770fdb6f2d84aded11716ca2b
Author: Don Zickus
AuthorDate: Wed, 1 Nov 2017 14:11:27 -0400
Committer: Thomas Gleixner
CommitDate: Wed, 1 Nov 2017 20:41:28 +0100
watchdog/hardlockup/perf
On Tue, Oct 31, 2017 at 03:11:07PM -0700, Guenter Roeck wrote:
> On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:
>
> [ ...]
>
> > So we have to revert
> >
> > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> >
> > Patch attached.
> >
>
> Tested-by
> > Is Chrome OS, changing the default timeout from 10s to something else?
> > That would explain it as a script is executed late in the boot cycle and
> > explain the quick restart.
> >
>
> Correct, Chrome OS changes the timeout from 10 to 5 seconds.
>
> A little experiment suggests that the pr
On Tue, Oct 31, 2017 at 10:16:22AM -0700, Guenter Roeck wrote:
> On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> > > I added some logging and a long msleep() in
> > > hardlockup_detector_perf_cleanup().
> > > Here
On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> Hi Thomas,
>
> we are seeing the following crash in v4.14-rc5/rc7 if
> CONFIG_HARDLOCKUP_DETECTOR
> is enabled.
>
> [5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [5.915836]
> =
inux/kernel/git/tip/tip.git WIP.core/urgent
>
> That's based on 4.13 final so it neither contains 4.14 nor -next material.
Tested your changes on 4.14-rc3 and it passes my tests. Thanks!
Tested-and-Reviewed-by: Don Zickus
On Mon, Oct 02, 2017 at 07:32:57PM +, Thomas Gleixner wrote:
> On Mon, 2 Oct 2017, Linus Torvalds wrote:
> > Side note: would it perhaps make sense to have that
> > cpus_read_lock/unlock() sequence around the whole reconfiguration
> > section?
> >
> > Because while looking at that sequence, it
On Tue, Sep 26, 2017 at 09:36:03AM +, Colin King wrote:
> From: Colin Ian King
>
> trivial fix to spelling mistake in pr_info message
Acked-by: Don Zickus
>
> Signed-off-by: Colin Ian King
> ---
> kernel/watchdog_hld.c | 2 +-
> 1 file changed, 1 insertion(+),
tine.
> >
> > Signed-off-by: Thomas Gleixner
> > Cc: Don Zickus
> > Cc: Chris Metcalf
> > Cc: linux-par...@vger.kernel.org
> > Cc: Peter Zijlstra
> > Cc: Sebastian Siewior
> > Cc: Nicholas Piggin
> > Cc: Ulrich Obergfell
> >
h makes it unreadable
>
> - There is more wreckage, but see the changelogs for the ugly details.
>
Aside from the simple compile issue in patch 25. I have no issues with this
patchset. Thanks Thomas!
Reviewed-by: Don Zickus
> The following series sanitizes the facility and addresses
homas Gleixner
> Cc: Don Zickus
> Cc: Chris Metcalf
> Cc: Peter Zijlstra
> Cc: Sebastian Siewior
> Cc: Nicholas Piggin
> Cc: Ulrich Obergfell
> Cc: Borislav Petkov
> Cc: Andrew Morton
> Link: http://lkml.kernel.org/r/20170831073054.997264...@linutronix.de
>
&g
On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
>
> - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> are due to recursive cpus_read_lock() others are due to
> cpus_read_lock() from CPU hotplug c
On Thu, Aug 31, 2017 at 09:16:22AM +0200, Thomas Gleixner wrote:
> The watchdog tries to create perf events even after it figured out that
> perf is not functional or the requested event is not supported.
>
> That's braindead as this can be done once at init time and if not supported
> the NMI wat
On Mon, Sep 04, 2017 at 02:10:50PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 04, 2017 at 01:09:06PM +0200, Ulrich Obergfell wrote:
>
> > - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run.
> > - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration
> > requ
On Fri, Sep 01, 2017 at 09:29:07PM +0200, Thomas Gleixner wrote:
> On Fri, 1 Sep 2017, Don Zickus wrote:
> > On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote:
> > > The following deadlock is possible in the watchdog hotplug code:
> > &g
On Thu, Aug 31, 2017 at 09:16:15AM +0200, Thomas Gleixner wrote:
> The lockup detector reconfiguration tears down all watchdog threads when
> the watchdog is disabled and sets them up again when its enabled.
>
> That's a pointless exercise. The watchdog threads are not consuming an
> insane amount
On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote:
> The following deadlock is possible in the watchdog hotplug code:
>
> cpus_write_lock()
> ...
> takedown_cpu()
> smpboot_park_threads()
> smpboot_park_thread()
> kthread_park()
>
On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
>
> - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> are due to recursive cpus_read_lock() others are due to
> cpus_read_lock() from CPU hotplug c
assumed an
arch went one way or the other.
I think this is a good workaround for now.
Acked-by: Don Zickus
>
> Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
> Signed-off-by: Nicholas Piggin
> ---
>
> arch/powerpc/Kconfig | 2 +-
> arch/x86/
where we register an nmi handler.
Acked-by: Don Zickus
>
> Signed-off-by: Scott Wood
> ---
> arch/x86/kernel/nmi.c | 18 +-
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 446c8aa0
On Mon, Jul 17, 2017 at 01:24:23AM +, Liang, Kan wrote:
> Hi Don & Thomas,
>
> Sorry for the late response. We just finished the tests for all proposed
> patches.
>
> There are three proposed patches so far.
> Patch 1: The patch as above which speed up the hrtimer.
> Patch 2: Thomas's first
On Thu, Jun 29, 2017 at 09:12:20AM -0700, Andi Kleen wrote:
> On Thu, Jun 29, 2017 at 11:44:06AM -0400, Don Zickus wrote:
> > On Wed, Jun 28, 2017 at 01:14:04PM -0700, Andi Kleen wrote:
> > > It can be a useful debugging tool for a specific class of bugs:
> > > when
On Wed, Jun 28, 2017 at 01:14:04PM -0700, Andi Kleen wrote:
> It can be a useful debugging tool for a specific class of bugs:
> when kernel software is looping forever.
>
> But if that happens does it really matter how many iterations the
> loop does before it is stopped?
>
> Even the current ti
On Tue, Jun 27, 2017 at 04:48:22PM -0700, Andi Kleen wrote:
> > I haven't heard back any test result yet.
> >
> > The above patch looks good to me.
>
> This needs performance testing. It may slow down performance or latency
> sensitive workloads.
More motivation to work through the issues with
On Tue, Jun 27, 2017 at 08:49:19PM +, Liang, Kan wrote:
>
> > On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote:
> > > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote:
> > > > On Fri, 23 Jun 2017, Don Zickus wrote:
> > > >
On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote:
> On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote:
> > On Fri, 23 Jun 2017, Don Zickus wrote:
> > > Hmm, all this work for a temp fix. Kan, how much longer until the real
> > > fix
> >
On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote:
> On Fri, 23 Jun 2017, Don Zickus wrote:
> > Hmm, all this work for a temp fix. Kan, how much longer until the real fix
> > of having perf count the right cycles?
>
> Quite a while. The approach is wilfully br
On Fri, Jun 23, 2017 at 10:01:55AM +0200, Thomas Gleixner wrote:
> On Thu, 22 Jun 2017, Don Zickus wrote:
> > On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote:
> > > On Wed, 21 Jun 2017, kan.li...@intel.com wrote:
> > > > We now have more and more sy
On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote:
> On Wed, 21 Jun 2017, kan.li...@intel.com wrote:
> > We now have more and more systems where the Turbo range is wide enough
> > that the NMI watchdog expires faster than the soft watchdog timer that
> > updates the interrupt tick the
On Wed, Jun 21, 2017 at 12:40:28PM +, Liang, Kan wrote:
>
> > >
> > > The right fix for mainline can be found here.
> > > perf/x86/intel: enable CPU ref_cycles for GP counter perf/x86/intel,
> > > watchdog: Switch NMI watchdog to ref cycles on x86
> > > https://patchwork.kernel.org/patch/97790
n]
>
> This adds another #ifdef around it.
Thanks!
Acked-by: Don Zickus
>
> Fixes: mmotm ("kernel/watchdog: provide watchdog_nmi_reconfigure() for arch
> watchdogs")
> Signed-off-by: Arnd Bergmann
> ---
> kernel/watchdog.c | 4
> 1 file changed
On Tue, Jun 20, 2017 at 02:33:09PM -0700, kan.li...@intel.com wrote:
> From: Kan Liang
>
> Some users reported spurious NMI watchdog timeouts.
>
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> upd
watchdog patches seem to go via Andrew...
Andrew, suggestions here?
Reviewed-by: Don Zickus
Acked-by: Don Zickus
>
> Thanks,
> Nick
>
> Nicholas Piggin (5):
> watchdog: remove unused declaration
> watchdog: introduce arch_touch_nmi_watchdog()
> watchdog: spl
On Fri, Jun 16, 2017 at 01:59:00AM +1000, Nicholas Piggin wrote:
> On Thu, 15 Jun 2017 11:51:22 -0400
> Don Zickus wrote:
>
> > On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > /* boot commands */
> > /*
> >* Should we panic when a soft-lockup or hard-lockup occurs:
> > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > return
On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
> > Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
> > are getting close to have this defined properly which would allow us to
> > split the code up correctly in the future.
>
> How's this for a replacem
On Mon, Jun 12, 2017 at 06:07:39PM +1000, Nicholas Piggin wrote:
> > > This would probably be the right direction to go in, but it will take
> > > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > > meaning that an arch has its own watchdog and does not want any HLD
> > > st
On Wed, Jun 07, 2017 at 01:50:26PM +1000, Nicholas Piggin wrote:
> >
> > I _think_ having
> >
> > depends on LOCKUP_DETECTOR
> > depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> > select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> >
> > will work because your new definition of HARDLOC
On Tue, Jun 06, 2017 at 02:46:48PM -0500, Babu Moger wrote:
> Hi Don, Nicholas,
>
>
> On 6/6/2017 11:08 AM, Don Zickus wrote:
> > (adding Babu)
> >
> > On Tue, May 30, 2017 at 11:26:55AM +1000, Nicholas Piggin wrote:
> > > Since last time:
> > >
On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:
> > My last concern is wrapping my head around the config options.
> >
> > HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.
>
> Yeah it's not the clearest. I think we need another pass over config
> options to start straight
(adding Babu)
On Tue, May 30, 2017 at 11:26:55AM +1000, Nicholas Piggin wrote:
> Since last time:
>
> - Have the perf based hardlockup detector use arch_touch_nmi_watchdog()
> rather than hld_touch_nmi_watchdog(). This changes direction slightly
> to make the perf-based hard lockup detector a
On Tue, May 30, 2017 at 11:26:58AM +1000, Nicholas Piggin wrote:
> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
>
> LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
> for lockup detectors. An architecture that d
On Fri, May 26, 2017 at 10:39:09AM +1000, Nicholas Piggin wrote:
> On Thu, 25 May 2017 10:08:33 -0400
> Don Zickus wrote:
>
> > On Thu, May 25, 2017 at 06:28:56PM +1000, Nicholas Piggin wrote:
> > > After reconfiguring watchdog sysctls etc., architecture specific
> &g
On Fri, May 26, 2017 at 10:31:03AM +1000, Nicholas Piggin wrote:
> On Thu, 25 May 2017 09:55:59 -0400
> Don Zickus wrote:
>
> > On Thu, May 25, 2017 at 06:28:54PM +1000, Nicholas Piggin wrote:
> > > For architectures that define HAVE_NMI_WATCHDOG, instead of havin
On Thu, May 25, 2017 at 06:28:56PM +1000, Nicholas Piggin wrote:
> After reconfiguring watchdog sysctls etc., architecture specific
> watchdogs may not get all their parameters updated.
>
> watchdog_reconfigure() can be implemented to pull the new values
> in and set the arch NMI watchdog.
I unde
On Thu, May 25, 2017 at 06:28:54PM +1000, Nicholas Piggin wrote:
> For architectures that define HAVE_NMI_WATCHDOG, instead of having
> them provide the complete touch_nmi_watchdog() function, just have
> them provide arch_touch_nmi_watchdog().
>
> This gives the generic code more flexibility in i
On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote:
> > I am curious to know what IBM thinks there. Currently the HARDLOCKUP
> > detector sits on top of perf. I get the impression, you are removing that
> > dependency. Is that a permanent thing or are you thinking of switching back
On Fri, May 19, 2017 at 09:07:31AM +1000, Nicholas Piggin wrote:
> On Thu, 18 May 2017 12:30:28 -0400
> Don Zickus wrote:
>
> > (adding Uli)
> >
> > On Fri, May 19, 2017 at 01:50:26AM +1000, Nicholas Piggin wrote:
> > > I'd like to make it easier
(adding Uli)
On Fri, May 19, 2017 at 01:50:26AM +1000, Nicholas Piggin wrote:
> I'd like to make it easier for architectures that have their own NMI /
> hard lockup detector to reuse various configuration interfaces that are
> provided by generic detectors (cmdline, sysctl, suspend/resume calls).
On Tue, Mar 07, 2017 at 08:00:33AM -0800, Mike Travis wrote:
>
>
> On 3/7/2017 7:22 AM, Don Zickus wrote:
> > On Tue, Mar 07, 2017 at 08:42:10AM +0100, Ingo Molnar wrote:
> >>
> >> * Mike Travis wrote:
> >>
> >>> Add a new NMI call
On Tue, Mar 07, 2017 at 08:42:10AM +0100, Ingo Molnar wrote:
>
> * Mike Travis wrote:
>
> > Add a new NMI call chain that is called last after all other NMI handlers
> > have been checked and did not "handle" the NMI. This mimics the current
> > NMI_UNKNOWN call chain except it eliminates the W
(cc'ing Andrew)
On Tue, Jan 03, 2017 at 04:19:50PM -0500, Prarit Bhargava wrote:
>
>
> On 12/01/2016 03:06 PM, Don Zickus wrote:
> > On Tue, Nov 29, 2016 at 08:15:21AM -0500, Prarit Bhargava wrote:
> >> When CONFIG_BOOTPARAM_HOTPLUG_CPU0 is enabled, the socket con
On Sat, Dec 10, 2016 at 01:41:03PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Dec 09, 2016 at 11:46:54PM +0100, Dodji Seketeli wrote:
> > Hello,
> >
> > Nicholas Piggin a �crit:
> >
> > [...]
> >
> > > That said, a dwarf based checker tool should be able to do as good a job
> > > (maybe a bit b
On Fri, Dec 09, 2016 at 01:50:41PM +1000, Nicholas Piggin wrote:
> >
> > We have plenty of customers with 10 year old drivers, where the expertise
> > has long left the company. The engineers still around, recompile and make
> > tweaks to get things working on the latest RHEL. Verify it passes t
Signed-off-by: Don Zickus
---
include/linux/nmi.h | 1 +
kernel/watchdog.c | 9 +
kernel/watchdog_hld.c | 3 +++
3 files changed, 13 insertions(+)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 0ea0a38..67e3392 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
he last watchdog thread is
> disabled.
>
> This patch is based on top of linux-next akpm-base.
It passed my tests. Thanks!
Acked-by: Don Zickus
>
> Signed-off-by: Prarit Bhargava
> Cc: Borislav Petkov
> Cc: Tejun Heo
> Cc: Don Zickus
> Cc: Hidehiro Kawai
>
On Thu, Dec 01, 2016 at 05:06:11PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Dec 01, 2016 at 10:40:59AM -0500, Don Zickus wrote:
> > Unfortunately, there are various drivers that will never go upstream
> >
> > - paid storage drivers that provide bells and whistles on top
On Thu, Dec 01, 2016 at 07:26:09AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 01, 2016 at 10:20:39AM -0500, Don Zickus wrote:
> >
> > - provide the memory allocation (instead of having the driver staticly
> > allocate)
> > - provide functions to retrieve variou
On Thu, Dec 01, 2016 at 03:32:15PM +1100, Nicholas Piggin wrote:
> > Anyway, MODVERSIONS is our way of protecting our kabi for the last 10 years.
> > It isn't perfect and we have fixed the genksyms tool over the years, but so
> > far it mostly works fine.
>
> Okay. It would be good to get all the
On Wed, Nov 30, 2016 at 10:40:02AM -0800, Linus Torvalds wrote:
> On Wed, Nov 30, 2016 at 10:18 AM, Nicholas Piggin wrote:
> >
> > Here's an initial rough hack at removing modversions. It gives an idea
> > of the complexity we're carrying for this feature (keeping in mind most
> > of the lines rem
nd verify things later this week.
Cheers,
Don
>
> Signed-off-by: Prarit Bhargava
> Cc: Borislav Petkov
> Cc: Tejun Heo
> Cc: Don Zickus
> Cc: Hidehiro Kawai
> Cc: Thomas Gleixner
> Cc: Andi Kleen
> Cc: Joshua Hunt
> Cc: Ingo Molnar
> Cc: Babu Moger
>
me.
Hmm, it occurred to me, with pr_info_once, what happens if you disable and
re-enable, is this still printed?
echo 0 > /proc/sys/kernel/watchdog
echo 1 > /proc/sys/kernel/watchdog
Cheers,
Don
>
> Signed-off-by: Prarit Bhargava
> Cc: Borislav Petkov
> Cc: Tejun Heo
> Cc: Don Zic
o implement their own
> handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
> as weak such that architectures can override its definitions.
>
> Thanks to Don Zickus for his suggestions.
> Here are our previous discussions
> http://www.spinics.net/lists/sparc
On Mon, Oct 31, 2016 at 04:30:59PM -0500, Babu Moger wrote:
>
> On 10/31/2016 4:00 PM, Don Zickus wrote:
> >On Wed, Oct 26, 2016 at 09:02:19AM -0700, Babu Moger wrote:
> >>This is an attempt to cleanup watchdog handlers. Right now,
> >>kernel/watchdog.c implements
o implement their own
> handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
> as weak such that architectures can override its definitions.
>
> Thanks to Don Zickus for his suggestions.
> Here is the previous discussion
> http://www.spinics.net/lists/sparclinux/msg
o implement their own
> handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
> as weak such that architectures can override its definitions.
Thanks for the patches Babu. I will try to get to them today or tomorrow.
Cheers,
Don
>
> Thanks to Don Zickus for his suggestio
On Fri, Oct 21, 2016 at 04:50:21PM -0500, Babu Moger wrote:
> Don,
>
> On 10/21/2016 2:19 PM, Andrew Morton wrote:
> >On Fri, 21 Oct 2016 11:11:14 -0400 Don Zickus wrote:
> >
> >>On Thu, Oct 20, 2016 at 08:25:27PM -0700, Andrew Morton wrote:
> >>>On
On Thu, Oct 20, 2016 at 08:25:27PM -0700, Andrew Morton wrote:
> On Thu, 20 Oct 2016 12:14:14 -0400 Don Zickus wrote:
>
> > > > -static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
> > > > -static void watchdog_nmi_disable(unsigned int cpu) { return; }
On Wed, Oct 19, 2016 at 05:00:12PM -0700, Andrew Morton wrote:
> On Thu, 13 Oct 2016 13:38:01 -0700 Babu Moger wrote:
>
> > Currently we do not have a way to enable/disable arch specific
> > watchdog handlers if it was implemented by any of the architectures.
> >
> > This patch introduces new fu
on x86.
Thanks Babu!
Tested-and-Reviewed-by: Don Zickus
>
> v3:
> Made one more change per Don Zickus comments.
> Moved failure path messages to into generic code inside watchdog_nmi_enable.
> Also added matching prints in sparc to warn about the failure.
>
> v2:
>
On Thu, Oct 13, 2016 at 01:38:01PM -0700, Babu Moger wrote:
> Currently we do not have a way to enable/disable arch specific
> watchdog handlers if it was implemented by any of the architectures.
>
> This patch introduces new functions arch_watchdog_nmi_enable and
> arch_watchdog_nmi_disable which
rg's comments about making the definitions visible.
> With the new approach we dont need those definitions((NMI_WATCHDOG_ENABLED,
> SOFT_WATCHDOG_ENABLED etc..) outside watchdog.c. So no action.
>
> b) Made changes per Don Zickus comments.
> Don, I could not use your p
On Thu, Oct 06, 2016 at 03:16:41PM -0700, Babu Moger wrote:
> During our testing we noticed that nmi watchdogs in sparc could not be
> disabled or
> enabled dynamically using sysctl/proc interface. Sparc uses its own arch
> specific
> nmi watchdogs. There is a sysctl and proc
> interface(proc/sy
On Wed, Sep 21, 2016 at 11:18:29AM +0200, Jiri Olsa wrote:
> On Wed, Sep 21, 2016 at 09:08:40AM +, Stanislav Ievlev wrote:
> > Hi, Jiri!
> >
> > Why are you not using unsigned integer for counters in c2c_stats structure?
>
> hi,
> never really thought of that, because that's one of the origin
tchdog: enabled on all CPUs, permanently consumes one hw-PMU
> > counter.
> > NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU
> > counter.
> >
> > There doesn't appear to be a reason for doing this work every time a write
> > occurs,
On Fri, Mar 18, 2016 at 05:48:58PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> > > Would something like this be a better patch?
> >
> > > -#define
- pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
+ pr_emerg("Detected soft lockup - CPU#%d stuck for %us!
[%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(c
l of the watchdogs (on each cpu). Perhaps a corner case
> : will pop up (the scheduler?? to mimic touch_all_softlockup_watchdogs() ).
> :
> : But this does address an issue where if a system is locked up and one cpu
> : is spewing out useful debug messages (or error messages), the hard
, permanently consumes one
> hw-PMU counter.
> [ 955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one
> hw-PMU counter.
>
> There doesn't appear to be a reason for doing this work every time a write
> occurs, so only do it when the values change.
Acked-by:
On Mon, Mar 14, 2016 at 11:02:31PM -0500, Josh Hunt wrote:
> On 03/14/2016 11:29 AM, Don Zickus wrote:
> >
> >Hi Josh,
> >
> >I believe Uli thought the below patch might fix it.
> >
> >Cheers,
> >Don
>
> Don
>
> It looks like I was
On Mon, Mar 14, 2016 at 09:45:26AM -0500, Josh Hunt wrote:
> On 03/14/2016 09:34 AM, Don Zickus wrote:
> >On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote:
> >>While working on a script to restore all sysctl params before a series of
> >>tests I found that
On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote:
> While working on a script to restore all sysctl params before a series of
> tests I found that writing any value into the
> /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
> causes them to call proc_watchdog_updat
On Wed, Feb 03, 2016 at 10:23:42AM -0700, Jeffrey Merkey wrote:
> > Hmm, I am confused here. So you are saying because we are in the nmi
> > handler you can not break into the system? The nmi handler prints some
> > stuff to the screen, pokes the other cpus to print stuff to the screen and
> > th
1 - 100 of 659 matches
Mail list logo