[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

Guilherme G. Piccoli Fri, 23 Jul 2021 11:16:21 -0700

** Description changed:

- Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using
- size Standard DS2 v2
+ [Impact]
  
- Install kexec-tools kdump-tools makedumpfile, configure boot kernel
- parameter crashkernel=512M
+ * When kdumping on trusty/4.15 in an Azure instance, we observe quite
+ frequently a stall on the kdump kernel, it gets blocked and soon we see
+ a stack like the following:
  
- Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and
- sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec
+ [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
+ [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
+ [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
+ [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  
- Reboot VM, make sure crash kernel memory reserved successfully
+ * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
+ [  529.515013] sysrq: Show Blocked State
+ [  529.517730]   task                        PC stack   pid father
+ [  529.519006] kworker/u4:2    D    0    94      2 0x80000000
+ [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? check_preempt_wakeup+0x162/0x260
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
+ [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
+ [  529.519006]  ? __switch_to_asm+0x41/0x70
+ [  529.519006]  synchronize_srcu+0xd1/0xd6
+ [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
+ [  529.519006]  process_one_work+0x14e/0x390
+ [  529.519006]  worker_thread+0x1cc/0x3d0
+ [  529.519006]  kthread+0x105/0x140
+ [  529.519006]  ? max_active_store+0x60/0x60
+ [  529.519006]  ? kthread_bind+0x20/0x20
+ [  529.519006]  ret_from_fork+0x35/0x40
+ [  529.519006] udevadm         D    0   544      1 0x00000000
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? try_to_wake_up+0x4a/0x460
+ [  529.519006]  ? try_to_wake_up+0x4a/0x460
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __flush_work.isra.29+0x119/0x1b0
+ [  529.519006]  ? destroy_worker+0x90/0x90
+ [  529.519006]  flush_delayed_work+0x3f/0x50
+ [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
+ [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
+ [  529.519006]  inotify_release+0x1e/0x50
+ [  529.519006]  __fput+0xea/0x220
+ [  529.519006]  ____fput+0xe/0x10
+ [  529.519006]  task_work_run+0x8c/0xb0
+ [  529.519006]  exit_to_usermode_loop+0x70/0xa9
+ [  529.519006]  do_syscall_64+0x1b5/0x1e0
+ [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
+ [  529.519006] dhclient        D    0   573    572 0x00000000
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __wait_rcu_gp+0x123/0x150
+ [  529.519006]  synchronize_sched+0x4e/0x60
+ [  529.519006]  ? __call_rcu+0x2f0/0x2f0
+ [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
+ [  529.519006]  synchronize_net+0x1c/0x30
+ [  529.519006]  __unregister_prot_hook+0xcd/0xf0
+ [  529.519006]  packet_do_bind+0x1bd/0x250
+ [  529.519006]  packet_bind+0x2f/0x50
+ [  529.519006]  SYSC_bind+0xd8/0x110
+ [  529.519006]  ? sock_alloc_file+0x91/0x130
+ [  529.519006]  SyS_bind+0xe/0x10
+ [  529.519006]  do_syscall_64+0x80/0x1e0
+ [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  
- Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
- /proc/sysrq-trigger, VM will reboot automatically and after reboot,
- check crash dump is generated under /var/crash
+ * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
+ the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led
+ us to the fix - the following patch, when backported to a problematic
+ version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement
+ Direct Mode for stimer0")
  
- Install linux-azure kernel
+ * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
+ were passed to the hypervisor through a vmbus message, a mechanism of
+ communication of hyper-v guests/hypervisor. With the patch, a check is
+ made (through MSR-like mechanism) and if the hypervisor supports, a
+ direct timer IRQ mechanism is put in-place instead of the vmbus channel.
  
- Enable private-ppa canonical-kernel-esm
+ * Our theory is that on kdump kernel, specially due to the single cpu
+ nature, the vmbus-messaged timer IRQ could interfere with scheduling and
+ create a dead-lock condition, which is what we observe from the stack
+ traces. Hence, we hereby propose to backport such patch to azure 4.15
+ kernels.
  
- Install kernel linux-azure, reboot VM, kernel version 4.15.0-1084-azure
+ * Our hypothesis about why this wasn't observed in Bionic is that
+ changes in the kdump mechanism make it start earlier, and this might
+ hide the schedule issue. Lack of testing also could be the cause of the
+ non-observability.
  
- Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
- /proc/sysrq-trigger, VM entered into a hung status.
+ [Test Case]
  
- Attach whole serial console
- [   65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
- [   65.456004]  1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
- [   65.456004]  (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
- [   65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
- [  360.020026] INFO: rcu_sched detected stalls on CPUs/tasks:
- [  360.024015]  1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1
- [  360.024015]  (detected by 0, t=15002 jiffies, g=717, c=716, q=6429)
- [  360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
- [  420.048010] INFO: rcu_sched detected stalls on CPUs/tasks:
- [  420.052006]  1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0
- [  420.052006]  (detected by 0, t=15002 jiffies, g=718, c=717, q=6429)
- [  420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
+ * Perform a kdump in Trusty/4.15 and after some rounds, you should see
+ the mentioned RCU stack trace. With the patch, no issue is observed, and
+ we could perform ~30 kdumps in a row.
+ 
+ [Where problems could occur]
+ 
+ * Since the patch touches IRQ code (although restricted to Hyper-v),
+ it's a somewhat invasive change. It could lead to issues in the IRQ
+ subsystem, although the patch is upstream since 2018 and no fixes for
+ that were found (we count on Fixes tag for that).


** Also affects: linux-azure (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: linux-azure (Ubuntu)
     Assignee: Guilherme G. Piccoli (gpiccoli) => (unassigned)

** Changed in: linux-azure (Ubuntu)
     Assignee: (unassigned) => Marcelo Cerri (mhcerri)

** Changed in: linux-azure (Ubuntu Bionic)
     Assignee: (unassigned) => Marcelo Cerri (mhcerri)

** Changed in: linux-azure (Ubuntu Bionic)
       Status: New => In Progress

** Changed in: linux-azure (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux-azure (Ubuntu Bionic)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress
Status in linux-azure source package in Bionic:
  In Progress

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   task                        PC stack   pid father
  [  529.519006] kworker/u4:2    D    0    94      2 0x80000000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm         D    0   544      1 0x00000000
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  ____fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclient        D    0   573    572 0x00000000
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e19305 ("Drivers: hv:
  vmbus: Implement Direct Mode for stimer0")

  * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
  were passed to the hypervisor through a vmbus message, a mechanism of
  communication of hyper-v guests/hypervisor. With the patch, a check is
  made (through MSR-like mechanism) and if the hypervisor supports, a
  direct timer IRQ mechanism is put in-place instead of the vmbus
  channel.

  * Our theory is that on kdump kernel, specially due to the single cpu
  nature, the vmbus-messaged timer IRQ could interfere with scheduling
  and create a dead-lock condition, which is what we observe from the
  stack traces. Hence, we hereby propose to backport such patch to azure
  4.15 kernels.

  * Our hypothesis about why this wasn't observed in Bionic is that
  changes in the kdump mechanism make it start earlier, and this might
  hide the schedule issue. Lack of testing also could be the cause of
  the non-observability.

  [Test Case]

  * Perform a kdump in Trusty/4.15 and after some rounds, you should see
  the mentioned RCU stack trace. With the patch, no issue is observed,
  and we could perform ~30 kdumps in a row.

  [Where problems could occur]

  * Since the patch touches IRQ code (although restricted to Hyper-v),
  it's a somewhat invasive change. It could lead to issues in the IRQ
  subsystem, although the patch is upstream since 2018 and no fixes for
  that were found (we count on Fixes tag for that).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

Reply via email to