Public bug reported: We've run into some issues where upgrading the kernel from a 4.10 series to a 4.13 series on Ubuntu 16.04 hosts that make heavy use of inotify causes panics and lockups in the kernel in inotify-related code. Our particular use case seemed to hit these at a rate of one every 30 minutes or so when serving up production traffic. Unfortunately, I have been unable to replicate the issue so far with a simulated load-testing environment.
When the issue occurs, we get dmesg entries like "BUG: soft lockup - CPU#0 stuck for 22s!" or "General protection fault: 0000 [#1] SMP PTI". In the soft lockup case, the host is still up but all I/O operations stall indefinitely (e.g. typing "sync" into the console will hang forever). In the protection fault case, the system reboots. I've attached dmesg output from the two cases to this bugreport. We have noticed the issue with the following kernels: - linux-image-4.13.0-1013-gcp - linux-image-4.13.0-1015-gcp - linux-image-4.13.0-36-generic We did _not_ have the issue with - linux-image-4.10.0-32-generic I've submitted this bug report from a system which should be configured identically to our production hosts that were having issue (the affected hosts were immediately rolled back to 4.10). This bug appears to have been fixed upstream as of 4.17-rc3 in this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d90a10e2444ba5a351fa695917258ff4c5709fa5 I would guess that perhaps this patch should be backported into both the 4.13 HWE and GCP Ubuntu kernel series? Thanks, KJ ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.13.0-1013-gcp 4.13.0-1013.17 ProcVersionSignature: Ubuntu 4.13.0-1013.17-gcp 4.13.16 Uname: Linux 4.13.0-1013-gcp x86_64 ApportVersion: 2.20.1-0ubuntu2.16 Architecture: amd64 Date: Mon May 14 07:58:29 2018 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: linux-gcp UpgradeStatus: No upgrade log present (probably fresh install) ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete ** Tags: amd64 apport-bug uec-images xenial ** Attachment added: "Dmesg output during soft lockup" https://bugs.launchpad.net/bugs/1771075/+attachment/5139122/+files/soft_lockup.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-gcp in Ubuntu. https://bugs.launchpad.net/bugs/1771075 Title: General Protection fault in inotify (fixed upstream) Status in linux package in Ubuntu: Incomplete Bug description: We've run into some issues where upgrading the kernel from a 4.10 series to a 4.13 series on Ubuntu 16.04 hosts that make heavy use of inotify causes panics and lockups in the kernel in inotify-related code. Our particular use case seemed to hit these at a rate of one every 30 minutes or so when serving up production traffic. Unfortunately, I have been unable to replicate the issue so far with a simulated load-testing environment. When the issue occurs, we get dmesg entries like "BUG: soft lockup - CPU#0 stuck for 22s!" or "General protection fault: 0000 [#1] SMP PTI". In the soft lockup case, the host is still up but all I/O operations stall indefinitely (e.g. typing "sync" into the console will hang forever). In the protection fault case, the system reboots. I've attached dmesg output from the two cases to this bugreport. We have noticed the issue with the following kernels: - linux-image-4.13.0-1013-gcp - linux-image-4.13.0-1015-gcp - linux-image-4.13.0-36-generic We did _not_ have the issue with - linux-image-4.10.0-32-generic I've submitted this bug report from a system which should be configured identically to our production hosts that were having issue (the affected hosts were immediately rolled back to 4.10). This bug appears to have been fixed upstream as of 4.17-rc3 in this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d90a10e2444ba5a351fa695917258ff4c5709fa5 I would guess that perhaps this patch should be backported into both the 4.13 HWE and GCP Ubuntu kernel series? Thanks, KJ ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.13.0-1013-gcp 4.13.0-1013.17 ProcVersionSignature: Ubuntu 4.13.0-1013.17-gcp 4.13.16 Uname: Linux 4.13.0-1013-gcp x86_64 ApportVersion: 2.20.1-0ubuntu2.16 Architecture: amd64 Date: Mon May 14 07:58:29 2018 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: linux-gcp UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771075/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp