(In reply to Brendan Long from comment #589) > I strongly suspect that the graphics driver was the problem since my lockups > would cause the screen to become completely unresponsive, but sound > continued working, and in one case I had a lockup during a video call and > the other person could still see and hear me.
What you're describing here is a new "feature" introduced between kernel 4.19.16 and 17 e.g. I can see exactly the same here with radeon hardware. The system is completely working (even VMs on the host are running well) except of graphics - even tty terminals are working sometimes. When ssh'ing the machine, I can always see log entries like these: radeon 0000:0a:00.0: ring 0 stalled for more than 14084msec radeon 0000:0a:00.0: GPU lockup (current fence id 0x0000000000053ed7 last fence id 0x0000000000053f0f on ring 0) ... I'm trying to near it down currently using git bisect. The suspicious changes left are at the moment: 2019-01-22 arm64: Don't trap host pointer auth use to EL2 Mark Rutland bad 2019-01-22 arm64/kvm: consistently handle host HCR_EL2 flags Mark Rutland 2019-01-22 scsi: target: iscsi: cxgbit: fix csk leak Varun Prakash 2019-01-22 scsi: target: iscsi: cxgbit: fix csk leak Varun Prakash 2019-01-22 Revert "scsi: target: iscsi: cxgbit: fix csk leak" Sasha Levin 2019-01-22 mmc: sdhci-msm: Disable CDR function on TX Loic Poulain 2019-01-22 netfilter: nf_conncount: fix argument order to find_next_bit 2019-01-22 netfilter: nf_conncount: speculative garbage collection on empty lists Pablo Neira Ayuso 2019-01-22 netfilter: nf_conncount: move all list iterations under spinlock Pablo Neira Ayuso 2019-01-22 netfilter: nf_conncount: merge lookup and add functions Florian Westphal 2019-01-22 netfilter: nf_conncount: restart search when nodes have been erased Florian Westphal ? 2019-01-22 netfilter: nf_conncount: split gc in two phases Florian Westphal 2019-01-22 netfilter: nf_conncount: don't skip eviction when age is negative Florian Westphal 2019-01-22 netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS Shawn Bohrer 2019-01-22 can: gw: ensure DLC boundaries after CAN frame modification Oliver Hartkopp 2019-01-22 tty: Don't hold ldisc lock in tty_reopen() if ldisc present Dmitry Safonov 2019-01-22 tty: Simplify tty->count math in tty_reopen() Dmitry Safonov 2019-01-22 tty: Hold tty_ldisc_lock() during tty_reopen() Dmitry Safonov 2019-01-22 tty/ldsem: Wake up readers after timed out down_write() Dmitry Safonov As you're describing correctly, the problem seems to be network related. I'm getting this error two when watching videos from internet. I'm currently testing the changes between "restart search when nodes have been erased" and "Wake up readers after timed out down_write()". -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1690085 Title: Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks Status in Linux: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: Hi, We aregetting various kernel crash on a pretty new config. We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using latest BIOS available (1.52) We are running Ubuntu 17.04 (amd64), we've tried different kernel version, native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too. Tested kernel version: native 17.04 kernel 4.10.15 Issues are the same, we're getting random freeze on the machine. Here is kern.log entry when happening : May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls on CPUs/tasks: May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) idle=49b/1/0 softirq=28561/28563 fqs=913449 May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 jiffies, g=10001, c=10000, q=4656) May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0: May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0 R running task 0 0 0 0x00000008 May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace: May 10 22:41:56 dev2 kernel: [24366.190354] ? native_safe_halt+0x6/0x10 May 10 22:41:56 dev2 kernel: [24366.190355] ? default_idle+0x20/0xd0 May 10 22:41:56 dev2 kernel: [24366.190358] ? arch_cpu_idle+0xf/0x20 May 10 22:41:56 dev2 kernel: [24366.190360] ? default_idle_call+0x23/0x30 May 10 22:41:56 dev2 kernel: [24366.190362] ? do_idle+0x16f/0x200 May 10 22:41:56 dev2 kernel: [24366.190364] ? cpu_startup_entry+0x71/0x80 May 10 22:41:56 dev2 kernel: [24366.190366] ? rest_init+0x77/0x80 May 10 22:41:56 dev2 kernel: [24366.190368] ? start_kernel+0x464/0x485 May 10 22:41:56 dev2 kernel: [24366.190369] ? early_idt_handler_array+0x120/0x120 May 10 22:41:56 dev2 kernel: [24366.190371] ? x86_64_start_reservations+0x24/0x26 May 10 22:41:56 dev2 kernel: [24366.190372] ? x86_64_start_kernel+0x14d/0x170 May 10 22:41:56 dev2 kernel: [24366.190373] ? start_cpu+0x14/0x14 May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls on CPUs/tasks: May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) idle=49b/1/0 softirq=28561/28563 fqs=935027 May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 jiffies, g=10001, c=10000, q=4740) May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0: May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0 R running task 0 0 0 0x00000008 May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace: May 10 22:44:56 dev2 kernel: [24546.192199] ? native_safe_halt+0x6/0x10 May 10 22:44:56 dev2 kernel: [24546.192201] ? default_idle+0x20/0xd0 May 10 22:44:56 dev2 kernel: [24546.192203] ? arch_cpu_idle+0xf/0x20 May 10 22:44:56 dev2 kernel: [24546.192204] ? default_idle_call+0x23/0x30 May 10 22:44:56 dev2 kernel: [24546.192206] ? do_idle+0x16f/0x200 May 10 22:44:56 dev2 kernel: [24546.192208] ? cpu_startup_entry+0x71/0x80 May 10 22:44:56 dev2 kernel: [24546.192210] ? rest_init+0x77/0x80 May 10 22:44:56 dev2 kernel: [24546.192211] ? start_kernel+0x464/0x485 May 10 22:44:56 dev2 kernel: [24546.192213] ? early_idt_handler_array+0x120/0x120 May 10 22:44:56 dev2 kernel: [24546.192214] ? x86_64_start_reservations+0x24/0x26 May 10 22:44:56 dev2 kernel: [24546.192215] ? x86_64_start_kernel+0x14d/0x170 May 10 22:44:56 dev2 kernel: [24546.192217] ? start_cpu+0x14/0x14 Depending on the kernel version, we've got NMI watchdog errors related to CPU stuck (mentioning the CPU core id, which is random). Crash is happening randomly, but in general after some hours (3-4h). Now, we've installed kernel 4.11.0-041100-generic #201705041534 this morning and waiting for crash... For now, the machine is not "used", at least, it's not CPU stressed... Thanks --- ApportVersion: 2.20.4-0ubuntu4 Architecture: amd64 DistroRelease: Ubuntu 17.04 InstallationDate: Installed on 2017-05-09 (1 days ago) InstallationMedia: Ubuntu-Server 17.04 "Zesty Zapus" - Release amd64 (20170412) Package: linux (not installed) ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=fr_FR.UTF-8 SHELL=/bin/bash Tags: zesty Uname: Linux 4.11.0-041100-generic x86_64 UnreportableReason: The running kernel is not an Ubuntu kernel UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp