The fix is in the PCI tree now:

"PCI: hv: Fix hibernation in case interrupts are not re-create" (
https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv&id=915cff7f38c5e4d47f187f8049245afc2cb3e503
 )

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1894893

Title:
  [linux-azure][hibernation] GPU device no longer working after resume
  from hibernation in NV6 VM size

Status in linux-azure package in Ubuntu:
  New

Bug description:
  There are failed logs after resume from hibernation in NV6 (GPU passthrough 
size) VM in Azure:
  [ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5
  [ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5

  This happens to the latest stable release of the linux-azure
  5.4.0-1023.23 kernel and the latest mainline linux kernel.

  How reproducible: 
  100%

  Steps to Reproduce:
  1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please 
refer to 
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 )

  E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6
  vcpus, 56 GiB memory) VM in East US 2.

  2. Make sure the in-kernel open-source nouveau driver is loaded, or
  blacklist the nouveau driver and install the official Nvidia GPU
  driver (please follow https://docs.microsoft.com/en-us/azure/virtual-
  machines/linux/n-series-driver-setup : "Install GRID drivers on NV or
  NVv3-series VMs" -- the most important step to run the "./NVIDIA-
  Linux-x86_64-grid.run".)

  3. Run hibernation from serial console
  # systemctl hibernate

  4. After hibernation finishes, start VM and check dmesg
  # dmesg|grep fail

  Actual results:
  [ 1432.153730] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5
  [ 1432.167910] hv_pci 47505500-0001-0000-3130-444531334632: hv_irq_unmask() 
failed: 0x5

  And /proc/interrupts shows that the GPU interrupts are no longer
  happening.

  Expected results:
  No failed logs, and the GPU interrupt should still happen after hibernation.

  
  BUG FIX:
  I made a fix here: https://lkml.org/lkml/2020/9/4/1268.

  Without the patch, we see the error "hv_pci
  47505500-0001-0000-3130-444531334632: hv_irq_unmask() failed: 0x5"
  during hibernation when the VM has the Nvidia GPU driver loaded, and
  after hibernation the GPU driver can no longer receive any MSI/MSI-X
  interrupts when we check /proc/interrupts.

  With the patch, we should no longer see the error, and the GPU driver
  should still receive interrupts after hibernation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to