Upon further testing Lunar kernel 6.2 seems not be affected. I'll
investigate further to find out why.

** Changed in: linux-azure (Ubuntu Lunar)
       Status: New => Invalid

** Description changed:

  [Description]
  
  On a VM on Azure with a Tesla gpu it was noticed that when removing the
  gpu from the pci the vm would crash. In case the nvidia drivers are
  loaded, the machine won't crash. Instead the removing process will hang
  and the machine will crash on reboot.
  
  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.
  
  For this case we have identified that removing commit [2] prevents the
  kernel crashes.
  
  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.
  
  [Test Case]
  
  On an Azure vm with a gpu :
  
  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove
  
  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.
  
  [Where things could go wrong]
  
  The commit to be reverted was included in a patchset to address lp bugs
  https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594
  
  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.
  
  [Other]
  
  Only Ubuntu azure kernels are affected :
  
  - Jammy 5.15
- - Lunar 6.2
  
  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.
  
- 
- 
  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50&id=75af0c10b3703400890d314d1d91d25294234a81

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2042568

Title:
  Azure - Kernel crashes when removing gpu from pci

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  New
Status in linux-azure source package in Lunar:
  Invalid

Bug description:
  [Description]

  On a VM on Azure with a Tesla gpu it was noticed that when removing
  the gpu from the pci the vm would crash. In case the nvidia drivers
  are loaded, the machine won't crash. Instead the removing process will
  hang and the machine will crash on reboot.

  This is related to bug [1].
  The bug reported in [1] regards another driver but the root cause is the same.
  It is still investigated whether this is a bug in pci, or it is a bug of 
various drivers on how they use pci.

  For this case we have identified that removing commit [2] prevents the
  kernel crashes.

  Azure has requested to revert this commit, at least for the time being.
  This commit is not in upstream, so it just need to be reverted from Ubuntu 
kernels.

  [Test Case]

  On an Azure vm with a gpu :

  # echo '1' > /sys/bus/pci/devices/0001:00:00.0/remove

  where '0001:00:00.0' the pci address of the gpu.
  The vm will crash.

  [Where things could go wrong]

  The commit to be reverted was included in a patchset to address lp
  bugs https://bugs.launchpad.net/bugs/2023071 and
  https://bugs.launchpad.net/bugs/2023594

  However this commit just reduces boot time and removing shall not introduce 
any regressions.
  Side effects will be increase in the boot time.

  [Other]

  Only Ubuntu azure kernels are affected :

  - Jammy 5.15

  Focal is also affected since it's using 5.15 kernel.
  This commit does not appear in Mantic 6.5 kernel.

  [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
  [2] 
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/jammy/commit/?h=Ubuntu-azure-5.15.0-1043.50&id=75af0c10b3703400890d314d1d91d25294234a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2042568/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to