Hi Mehmet,
I was just able to send the sos report after crash with NFS debugging turned on 
to Ubuntu Support (case 00398586).
I have not testet your unofficial kernel version up to now, because I need a 
procedure to reproduce the crash after minutes and not after days.
Which procedure should I follow to reproduce the RCU bug?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2084023

Title:
  "rcu: INFO: rcu_sched self-detected stall on" CPU caused by nfs

Status in linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Noble:
  New

Bug description:
  This bug report has been opened because of
  https://bugs.launchpad.net/ubuntu/+source/nfs-
  utils/+bug/2062568/comments/25. Contents:

  
  We installed the unofficial kernel 6.8.0-46-generic-nfs on several NFS client 
servers on Saturday and have been testing it with high IO loads since then.
  Unfortunately the server crashed again after about 40 hours with "rcu: INFO: 
rcu_sched self-detected stall on CPU".
  The kernel 6.8.0-46-generic-nfs prevents the error message "RPC: Could not 
send backchannel reply error: -110",
  but not the crashs which we have been struggling with since August 19th 
switching the kernel from 6.5.0-44-generic to 6.8.0-40-generic.

  Our experiences with NFS server crashes are:
  - We were able to reproduce the crashes in our production and test 
environments. Rarely after minutes, sometimes after hours or days, but 
sometimes not at all,
    as we often stopped the experiments after 12 to 24 hours.
  - We have not yet been able to reproduce a crash between a bare metal NFS 
server and a bare metal NFS client, but between a bare metal NFS server and a 
virtualized client.
  - we could not reproduce a crash with NFS vers=4.0
  - the crashs happens with or without GSSPROXY

  Our setup:
  - virtualized NFS 4.2 server with Ubuntu 22.04.5 LTS and kernel 
5.15.0-122-generic
  - virtualized NFS client with Ubuntu 22.04.5 LTS and kernel 6.8.0-40-generic 
or kernel 6.8.0-45-generic
  - /etc/exports : /mnt/home 
nfsclient(sec=krb5,rw,root_squash,sync,no_subtree_check)
  - /etc/fstab : nfsserver:/mnt/home /home nfs 
vers=4.2,rw,soft,sec=krb5,proto=tcp 0 0
  - apt info nfs-common : Version: 1:2.6.1-1ubuntu1.2

  syslog of NFS server after crash:
  Sep 30 01:15:51 nfs-server.domain.de kernel: rcu: INFO: rcu_sched 
self-detected stall on CPU
  Sep 30 01:15:51 nfs-server.domain.de kernel: rcu: 54-....: (14998 ticks this 
GP) idle=2db/1/0x4000000000000000 softirq=32173387/32173387 fqs=7449
  Sep 30 01:15:51 nfs-server.domain.de kernel: (t=15000 jiffies g=144775177 
q=49782)
  Sep 30 01:15:51 nfs-server.domain.de kernel: NMI backtrace for cpu 54
  Sep 30 01:15:51 nfs-server.domain.de kernel: CPU: 54 PID: 153154 Comm: 
kworker/u480:36 Not tainted 5.15.0-122-generic #132-Ubuntu
  Sep 30 01:15:51 nfs-server.domain.de kernel: Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 
12/17/2019
  Sep 30 01:15:51 nfs-server.domain.de kernel: Workqueue: rpciod 
rpc_async_schedule [sunrpc]
  Sep 30 01:15:51 nfs-server.domain.de kernel: Call Trace:
  Sep 30 01:15:51 nfs-server.domain.de kernel: <IRQ>
  Sep 30 01:15:51 nfs-server.domain.de kernel: show_stack+0x52/0x5c
  Sep 30 01:15:51 nfs-server.domain.de kernel: dump_stack_lvl+0x4a/0x63
  Sep 30 01:15:51 nfs-server.domain.de kernel: dump_stack+0x10/0x16
  Sep 30 01:15:51 nfs-server.domain.de kernel: nmi_cpu_backtrace.cold+0x4d/0x93
  Sep 30 01:15:51 nfs-server.domain.de kernel: ? lapic_can_unplug_cpu+0x90/0x90
  Sep 30 01:15:51 nfs-server.domain.de kernel: 
nmi_trigger_cpumask_backtrace+0xec/0x100
  Sep 30 01:15:51 nfs-server.domain.de kernel: 
arch_trigger_cpumask_backtrace+0x19/0x20
  Sep 30 01:15:51 nfs-server.domain.de kernel: 
trigger_single_cpu_backtrace+0x44/0x4f
  Sep 30 01:15:51 nfs-server.domain.de kernel: rcu_dump_cpu_stacks+0x102/0x149
  Sep 30 01:15:51 nfs-server.domain.de kernel: print_cpu_stall.cold+0x2f/0xe2
  Sep 30...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2084023/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to