** Also affects: linux (Ubuntu Bionic)
Importance: High
Assignee: Joseph Salisbury (jsalisbury)
Status: In Progress
** Also affects: linux-hwe (Ubuntu Bionic)
Importance: High
Assignee: Joseph Salisbury (jsalisbury)
Status: In Progress
** Also affects: linux (Ubuntu Artful)
Importance: Undecided
Status: New
** Also affects: linux-hwe (Ubuntu Artful)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Zesty)
Importance: Undecided
Status: New
** Also affects: linux-hwe (Ubuntu Zesty)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Artful)
Status: New => In Progress
** Changed in: linux (Ubuntu Zesty)
Status: New => Incomplete
** Changed in: linux (Ubuntu Artful)
Importance: Undecided => High
** Changed in: linux (Ubuntu Zesty)
Importance: Undecided => High
** Changed in: linux (Ubuntu Artful)
Assignee: (unassigned) => Joseph Salisbury (jsalisbury)
** Changed in: linux (Ubuntu Zesty)
Assignee: (unassigned) => Joseph Salisbury (jsalisbury)
** No longer affects: linux-hwe (Ubuntu)
** No longer affects: linux-hwe (Ubuntu Zesty)
** No longer affects: linux-hwe (Ubuntu Artful)
** No longer affects: linux-hwe (Ubuntu Bionic)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713751
Title:
soft lockup / stall on CPU when shutting down with hwe 4.10 kernel
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Zesty:
Incomplete
Status in linux source package in Artful:
In Progress
Status in linux source package in Bionic:
In Progress
Bug description:
Instead of normal complete shutdowns we're getting soft lockup
failures. This started when 16.04 hwe packages switched to the 4.10
kernel about a month ago. I help manage a few hundred machines
spanning several different sites and several different hardware models
and they're all experiencing this intermittently, approximately 5% get
stuck on shutdown each day.
Here is an example of what is on the screen after it happens, the
machine is unresponsive and requires a hard reset. I can't see
anything in syslog or dmesg that differs when this happens, I think
all logging has stopped at this point in the shutdown.
[54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288)
[54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
(systemd:1)
[54746.232003] INFO: rcu_sched self-detected stall on CPU
[54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0
softirq=218389/218389 fqs=3247712
This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22:
... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
Reverting to 4.8.0-58 avoids the problem. I believe the problem has been
present with every hwe 4.10 kernel package through the current
linux-image-4.10.0-33-generic. This bug was filed with data right after it
occurred with linux-image-4.10.0-33-generic.
This only happens approximately 5% of the time with no discernible
pattern. I am able to reproduce the issue on one particular machine
by scheduling shutdowns 3 times per day and waiting up to a few days
for the problem to occur. Shutting down and starting up more
frequently, like every 5 minutes or even an hour, will not trigger the
problem, it seems like the machine needs to be running for a while.
It does not seem to depend on any user actions, it happens even if you
never login. It has happened on reboots as as opposed to shutdowns as
well. I found a few similar bug reports but nothing for these exact
symptoms.
I have tried blacklisting mei_me with no change in behavior. I'm not
sure but the majority of the affected machines are using intel video
chips. Next I am going to try a mainline 4.10 kernel.
lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
apt-cache policy linux-image-4.10.0-33-generic
linux-image-4.10.0-33-generic:
Installed: 4.10.0-33.37~16.04.1
Candidate: 4.10.0-33.37~16.04.1
Version table:
*** 4.10.0-33.37~16.04.1 500
500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64
Packages
100 /var/lib/dpkg/status
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1
ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-33-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
CurrentDesktop: XFCE
Date: Tue Aug 29 08:57:26 2017
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713751/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp