This bug was fixed in the package nvidia-graphics-drivers-535-server - 535.104.12-0ubuntu0.23.04.1
--------------- nvidia-graphics-drivers-535-server (535.104.12-0ubuntu0.23.04.1) lunar; urgency=medium * New upstream release (LP: #2037266): - Fixed an issue where the NVSwitch driver would not retrain NVLinks on init correctly on HGX 8 H100, in case they faulted earlier (such a due to GPU resets). This would result in links being down and CUDA workloads failing with "system not yet initialized" error. The issue was introduced in the 535.86.10 driver and fixed in 535.104.12 and later drivers. -- Alberto Milone <alberto.mil...@canonical.com> Mon, 25 Sep 2023 16:18:16 +0000 ** Changed in: nvidia-graphics-drivers-535-server (Ubuntu Lunar) Status: In Progress => Fix Released ** Changed in: nvidia-graphics-drivers-535 (Ubuntu Lunar) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to nvidia-graphics-drivers-535 in Ubuntu. https://bugs.launchpad.net/bugs/2037266 Title: Update the 535 driver series - 25/09/2023 Status in nvidia-graphics-drivers-535 package in Ubuntu: In Progress Status in nvidia-graphics-drivers-535-server package in Ubuntu: In Progress Status in nvidia-graphics-drivers-535 source package in Focal: Fix Released Status in nvidia-graphics-drivers-535-server source package in Focal: Fix Released Status in nvidia-graphics-drivers-535 source package in Jammy: Fix Released Status in nvidia-graphics-drivers-535-server source package in Jammy: Fix Released Status in nvidia-graphics-drivers-535 source package in Lunar: Fix Released Status in nvidia-graphics-drivers-535-server source package in Lunar: Fix Released Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] 535 (535.113.01): https://www.nvidia.com/Download/driverResults.aspx/211711/en-us/ 535-server (535.104.12): https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-104-12/index.html To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2037266/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp