** Description changed:

  [ Impact ]
  
-  * Microsoft Azure NV-series instances with NVidia GRID drivers started
+  * Microsoft Azure NV-series instances with NVidia GRID drivers started
  to experience xserver crashes while following Microsoft's official guide
  to installing Nvidia drivers [1].
  
-  * Root cause analysis showed that it was due to having a device with
+  * Root cause analysis showed that it was due to having a device with
  BusID "PCI:0@<domain_id>:0:0", where domain id is >= 32767 while the
  hyperv_drm kernel module is loaded.
  
-  * Removing either the BusID specification or unloading the hyperv_drm
+  * Removing either the BusID specification or unloading the hyperv_drm
  kernel module seems to fix the crash.
  
-  * The crash is happening while X.server is trying to enumerate PCI
+  * The crash is happening while X.server is trying to enumerate PCI
  devices. X.server dereferences a NULL pointer while trying to access to
  the PCI device info.
  
-  * The reason why it only happens while the hyperv_drm kernel module is
+  * The reason why it only happens while the hyperv_drm kernel module is
  loaded is that the hyperv_drm module does not expose PCI hardware
  information since it's a virtual device.
  
-  * The upstream patch [2] addresses the issue and it's confirmed that
+  * The upstream patch [2] addresses the issue and it's confirmed that
  the xserver with the patch does not experience the crash.
  
-  * Ubuntu Focal `xorg-server` package does not include the patch [2] at
+  * Ubuntu Focal `xorg-server` package does not include the patch [2] at
  the moment (xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6).
  
-  [1]: 
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms
-  [2]: 
https://github.com/freedesktop/xorg-xserver/commit/0d93bbfa2cfacbb73741f8bed0e32fa1a656b928
+  [1]: 
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms
+  [2]: 
https://github.com/freedesktop/xorg-xserver/commit/0d93bbfa2cfacbb73741f8bed0e32fa1a656b928
  
  [ Test Plan ]
  
  Part (a) is quoted from Microsoft's official guide [1].
  
  Part (a):
  
-  * Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported 
GPU
-    - e.g. `NV36adms A10`
-  * Install updates, required tooling, and the desktop environment:
-    - sudo apt-get update
-    - sudo apt-get upgrade -y
-    - sudo apt-get dist-upgrade -y
-    - sudo apt-get install build-essential ubuntu-desktop -y
-    - sudo apt-get install linux-azure -y
-  * Disable nouveau kernel driver:
-    # Create a blacklist file /etc/modprobe.d/nouveau.conf with following 
contents:
-    blacklist nouveau
-    blacklist lbm-nouveau 
-  * Reboot the VM, re-connect, and then stop X server:
-    - sudo reboot
-    # wait for the reboot, reconnect, and continue:
-    - sudo systemctl stop lightdm.service
-  * Download and install the NVidia GRID driver:
-    - wget -O NVIDIA-Linux-x86_64-grid.run 
https://go.microsoft.com/fwlink/?linkid=874272 
-    - chmod +x NVIDIA-Linux-x86_64-grid.run
-    - sudo ./NVIDIA-Linux-x86_64-grid.run
-    - # When the setup asks whether you want to run the nvidia-xconfig utility 
to update your X configuration file, select Yes.
-  * Copy /etc/nvidia/gridd.conf.template to /etc/nvidia/gridd.conf
-    - sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
-  * Edit /etc/nvidia/grid.conf
-    - sudo nano /etc/nvidia/grid.conf
-    # Append the following lines:
-    IgnoreSP=FALSE
-    EnableUI=FALSE
-    # Remove this line if present:
-    FeatureType=0
-    # And save.
-  * Reboot the VM
+  * Spawn a Microsoft Azure NV-series instance with an NVidia GRID-supported 
GPU
+    - e.g. `NV36adms A10`
+  * Install updates, required tooling, and the desktop environment:
+    - sudo apt-get update
+    - sudo apt-get upgrade -y
+    - sudo apt-get dist-upgrade -y
+    - sudo apt-get install build-essential ubuntu-desktop -y
+    - sudo apt-get install linux-azure -y
+  * Disable nouveau kernel driver:
+    # Create a blacklist file /etc/modprobe.d/nouveau.conf with following 
contents:
+    blacklist nouveau
+    blacklist lbm-nouveau
+  * Reboot the VM, re-connect, and then stop X server:
+    - sudo reboot
+    # wait for the reboot, reconnect, and continue:
+    - sudo systemctl stop lightdm.service
+  * Download and install the NVidia GRID driver:
+    - wget -O NVIDIA-Linux-x86_64-grid.run 
https://go.microsoft.com/fwlink/?linkid=874272
+    - chmod +x NVIDIA-Linux-x86_64-grid.run
+    - sudo ./NVIDIA-Linux-x86_64-grid.run
+    - # When the setup asks whether you want to run the nvidia-xconfig utility 
to update your X configuration file, select Yes.
+  * Copy /etc/nvidia/gridd.conf.template to /etc/nvidia/gridd.conf
+    - sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
+  * Edit /etc/nvidia/grid.conf
+    - sudo nano /etc/nvidia/grid.conf
+    # Append the following lines:
+    IgnoreSP=FALSE
+    EnableUI=FALSE
+    # Remove this line if present:
+    FeatureType=0
+    # And save.
+  * Reboot the VM
  
-  Part (b):
+  Part (b):
  
-   * Ensure that the hyperv_drm kernel module is loaded:
-     - sudo modprobe hyperv_drm 
-   * Use the attached xorg.conf file to override /etc/X11/xorg.conf file
-   * try to start the `xserver`:
-     - sudo startx
-   * `xserver` should crash with a similar output to the following:
-   X.Org X Server 1.20.13
-   X Protocol Version 11, Revision 0
-   Build Operating System: linux Ubuntu
-   Current Operating System: Linux a10test 5.15.0-1031-azure 
#38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64
-   Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-azure 
root=PARTUUID=4cac852b-afba-447b-b3e7-c002155c1305 ro console=tty1 
console=ttyS0 earlyprintk=ttyS0 panic=-1
-   Build Date: 07 February 2023  12:48:13PM
-   xorg-server 2:1.20.13-1ubuntu1~20.04.6 (For technical support please see 
http://www.ubuntu.com/support) 
-   Current version of pixman: 0.38.4
-     Before reporting problems, check http://wiki.x.org
-     to make sure that you have the latest version.
-   Markers: (--) probed, (**) from config file, (==) default setting,
-     (++) from command line, (!!) notice, (II) informational,
-     (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-   (==) Log file: "/var/log/Xorg.1.log", Time: Sat Feb 18 10:54:26 2023
-   (==) Using config file: "/etc/X11/xorg.conf"
-   (==) Using system config directory "/usr/share/X11/xorg.conf.d"
-   (EE) 
-   (EE) Backtrace:
-   (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55e7787c5ecc]
-   (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) 
[0x7f9576cac420]
-   (EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0xa7) 
[0x55e7786c4db7]
-   (EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x700) [0x55e7786bf1b0]
-   (EE) 4: /usr/lib/xorg/Xorg (xf86CallDriverProbe+0x5c) [0x55e7786971dc]
-   (EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+0x43) [0x55e778697b23]
-   (EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb]
-   (EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4]
-   (EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) 
[0x7f9576ac8083]
-   (EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace]
-   (EE) 
-   (EE) Segmentation fault at address 0x124
-   (EE) 
-   Fatal server error:
-   (EE) Caught signal 11 (Segmentation fault). Server aborting
-   (EE) 
-   (EE) 
-   Please consult the The X.Org Foundation support 
-      at http://wiki.x.org
-    for help. 
-   (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional 
information.
-   (EE) 
-   (EE) Server terminated with error (1). Closing log file.
-   ^Cxinit: giving up
-   xinit: unable to connect to X server: Connection refused
-   xinit: unexpected signal 2
+   * Ensure that the hyperv_drm kernel module is loaded:
+     - sudo modprobe hyperv_drm
+   * Use the attached xorg.conf file to override /etc/X11/xorg.conf file
+   * try to start the `xserver`:
+     - sudo startx
+   * `xserver` should crash with a similar output to the following:
+   X.Org X Server 1.20.13
+   X Protocol Version 11, Revision 0
+   Build Operating System: linux Ubuntu
+   Current Operating System: Linux a10test 5.15.0-1031-azure 
#38~20.04.1-Ubuntu SMP Mon Jan 9 18:23:48 UTC 2023 x86_64
+   Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-azure 
root=PARTUUID=4cac852b-afba-447b-b3e7-c002155c1305 ro console=tty1 
console=ttyS0 earlyprintk=ttyS0 panic=-1
+   Build Date: 07 February 2023  12:48:13PM
+   xorg-server 2:1.20.13-1ubuntu1~20.04.6 (For technical support please see 
http://www.ubuntu.com/support)
+   Current version of pixman: 0.38.4
+     Before reporting problems, check http://wiki.x.org
+     to make sure that you have the latest version.
+   Markers: (--) probed, (**) from config file, (==) default setting,
+     (++) from command line, (!!) notice, (II) informational,
+     (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
+   (==) Log file: "/var/log/Xorg.1.log", Time: Sat Feb 18 10:54:26 2023
+   (==) Using config file: "/etc/X11/xorg.conf"
+   (==) Using system config directory "/usr/share/X11/xorg.conf.d"
+   (EE)
+   (EE) Backtrace:
+   (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55e7787c5ecc]
+   (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) 
[0x7f9576cac420]
+   (EE) 2: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0xa7) 
[0x55e7786c4db7]
+   (EE) 3: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x700) [0x55e7786bf1b0]
+   (EE) 4: /usr/lib/xorg/Xorg (xf86CallDriverProbe+0x5c) [0x55e7786971dc]
+   (EE) 5: /usr/lib/xorg/Xorg (xf86BusConfig+0x43) [0x55e778697b23]
+   (EE) 6: /usr/lib/xorg/Xorg (InitOutput+0x90b) [0x55e7786a59eb]
+   (EE) 7: /usr/lib/xorg/Xorg (InitFonts+0x1d4) [0x55e778667ea4]
+   (EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) 
[0x7f9576ac8083]
+   (EE) 9: /usr/lib/xorg/Xorg (_start+0x2e) [0x55e778651ace]
+   (EE)
+   (EE) Segmentation fault at address 0x124
+   (EE)
+   Fatal server error:
+   (EE) Caught signal 11 (Segmentation fault). Server aborting
+   (EE)
+   (EE)
+   Please consult the The X.Org Foundation support
+      at http://wiki.x.org
+    for help.
+   (EE) Please also check the log file at "/var/log/Xorg.1.log" for additional 
information.
+   (EE)
+   (EE) Server terminated with error (1). Closing log file.
+   ^Cxinit: giving up
+   xinit: unable to connect to X server: Connection refused
+   xinit: unexpected signal 2
+ 
+ # To verify patch fixes the issue:
+ * Enable the following PPA that includes the fix: 
+   - sudo add-apt-repository ppa:mustafakemalgilor/lp2007746
+   - sudo apt update
+ * Install the package
+   - sudo apt install xserver-xorg-core=2:1.20.13-1ubuntu1~20.04.6ubuntu1
+ * Try to start xserver:
+   - sudo startx
+ * xserver should not crash.
+    
  
  [ Where problems could occur ]
  
-  * The regression risk is low, given that the patch is well-isolated and
+  * The regression risk is low, given that the patch is well-isolated and
  basically adds a null check that is already assumed to be there in the
  first place.
  
  [ Other Info ]
  
-  * workaround #1: unload hyperv_drm kernel module:
-    - sudo modprobe -r hyperv_drm
-  * workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] 
section:
-    Section "Device"
-       Identifier     "Device0"
-       Driver         "nvidia"
-       VendorName     "NVIDIA Corporation"
-       # BusID          "PCI:0@32828:0:0"
-       Option         "HardDPMS" "false"
-       Option         "CustomEDID" "DFP-0:/etc/X11/vdisplay.edid"
-    EndSection
+  * workaround #1: unload hyperv_drm kernel module:
+    - sudo modprobe -r hyperv_drm
+  * workaround #2: Comment out BusID line in /etc/X11/xorg.conf [Device] 
section:
+    Section "Device"
+       Identifier     "Device0"
+       Driver         "nvidia"
+       VendorName     "NVIDIA Corporation"
+       # BusID          "PCI:0@32828:0:0"
+       Option         "HardDPMS" "false"
+       Option         "CustomEDID" "DFP-0:/etc/X11/vdisplay.edid"
+    EndSection

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/2007746

Title:
  [SRU] xserver crashes when hyperv_drm kernel module is loaded on azure
  NV series instances w/ nvidia grid driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/2007746/+subscriptions


_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp

Reply via email to