[Kernel-packages] [Bug 1690085]

kmueller Tue, 16 Apr 2019 17:06:13 -0700

(In reply to Brendan Long from comment #589)
> I strongly suspect that the graphics driver was the problem since my lockups
> would cause the screen to become completely unresponsive, but sound
> continued working, and in one case I had a lockup during a video call and
> the other person could still see and hear me.


What you're describing here is a new "feature" introduced between kernel
4.19.16 and 17 e.g. I can see exactly the same here with radeon
hardware. The system is completely working (even VMs on the host are
running well) except of graphics - even tty terminals are working
sometimes. When ssh'ing the machine, I can always see log entries like
these:

radeon 0000:0a:00.0: ring 0 stalled for more than 14084msec
radeon 0000:0a:00.0: GPU lockup (current fence id 0x0000000000053ed7 last fence 
id 0x0000000000053f0f on ring 0)
...

I'm trying to near it down currently using git bisect. The suspicious
changes left are at the moment:

2019-01-22      arm64: Don't trap host pointer auth use to EL2                  
                Mark Rutland           bad
2019-01-22      arm64/kvm: consistently handle host HCR_EL2 flags               
                Mark Rutland
2019-01-22      scsi: target: iscsi: cxgbit: fix csk leak                       
                Varun Prakash
2019-01-22      scsi: target: iscsi: cxgbit: fix csk leak                       
                Varun Prakash
2019-01-22      Revert "scsi: target: iscsi: cxgbit: fix csk leak"              
                Sasha Levin
2019-01-22      mmc: sdhci-msm: Disable CDR function on TX                      
                Loic Poulain
2019-01-22      netfilter: nf_conncount: fix argument order to find_next_bit
2019-01-22      netfilter: nf_conncount: speculative garbage collection on 
empty lists          Pablo Neira Ayuso
2019-01-22      netfilter: nf_conncount: move all list iterations under 
spinlock                Pablo Neira Ayuso
2019-01-22      netfilter: nf_conncount: merge lookup and add functions         
                Florian Westphal

2019-01-22      netfilter: nf_conncount: restart search when nodes have been 
erased             Florian Westphal                ?
2019-01-22      netfilter: nf_conncount: split gc in two phases                 
                Florian Westphal
2019-01-22      netfilter: nf_conncount: don't skip eviction when age is 
negative               Florian Westphal
2019-01-22      netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with 
CONNCOUNT_SLOTS      Shawn Bohrer
2019-01-22      can: gw: ensure DLC boundaries after CAN frame modification     
                Oliver Hartkopp
2019-01-22      tty: Don't hold ldisc lock in tty_reopen() if ldisc present     
                Dmitry Safonov
2019-01-22      tty: Simplify tty->count math in tty_reopen()                   
                Dmitry Safonov
2019-01-22      tty: Hold tty_ldisc_lock() during tty_reopen()                  
                Dmitry Safonov
2019-01-22      tty/ldsem: Wake up readers after timed out down_write()         
                Dmitry Safonov

As you're describing correctly, the problem seems to be network related.
I'm getting this error two when watching videos from internet. I'm
currently testing the changes between "restart search when nodes have
been erased" and "Wake up readers after timed out down_write()".

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618]     0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977]     (detected by 12, t=1860207 
jiffies, g=10001, c=10000, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0       R  running task   
     0     0      0 0x00000008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461]     0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823]     (detected by 14, t=1905212 
jiffies, g=10001, c=10000, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0       R  running task   
     0     0      0 0x00000008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:44:56 dev2 kernel: [24546.192217]  ? start_cpu+0x14/0x14

  Depending on the kernel version, we've got NMI watchdog errors related to CPU 
stuck (mentioning the CPU core id, which is random).
  Crash is happening randomly, but in general after some hours (3-4h).

  Now, we've installed kernel 4.11.0-041100-generic #201705041534 this morning 
and waiting for crash...
  For now, the machine is not "used", at least, it's not CPU stressed...

  
  Thanks
  --- 
  ApportVersion: 2.20.4-0ubuntu4
  Architecture: amd64
  DistroRelease: Ubuntu 17.04
  InstallationDate: Installed on 2017-05-09 (1 days ago)
  InstallationMedia: Ubuntu-Server 17.04 "Zesty Zapus" - Release amd64 
(20170412)
  Package: linux (not installed)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=fr_FR.UTF-8
   SHELL=/bin/bash
  Tags:  zesty
  Uname: Linux 4.11.0-041100-generic x86_64
  UnreportableReason: The running kernel is not an Ubuntu kernel
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1690085]

Reply via email to