Does turning IOMMU off help? Maybe nouveau is touching something outside of its DMA range.
Anyway this will need nouveau devs to take a look. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1908294 Title: MCE on shutdown when nouveau driver loaded Status in linux package in Ubuntu: Confirmed Status in linux source package in Focal: Confirmed Status in linux source package in Groovy: Confirmed Status in linux source package in Hirsute: Confirmed Bug description: [Impact] When rebooting with the focal kernel, my system always MCEs. Installing an nvidia driver - or simply blacklisting the nouveau driver - avoids the issue. Sometimes it hard hangs the system, requiring a manual power cycle: [ OK ] Reached target Reboot. [ 402.489755] Disabling lock debugging due to kernel taint [ 402.495319] mce: [Hardware Error]: CPU 24: Machine Check Exception: 5 Bank 6: bb80000000000e0b [ 402.503924] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff9ead91c7> {intel_idle+0x87/0x130} [ 402.512530] mce: [Hardware Error]: TSC 29fb4740af0 MISC d7000000 [ 402.518622] mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1601415822 SOCKET 1 APIC 40 microcode 2006906 [ 402.527998] mce: [Hardware Error]: Run the above through 'mcelog --ascii' Other times it emits the MCE tombstone, but goes ahead and reboots itself: [ OK ] Reached target Reboot. [ 870.372933] Disabling lock debugging due to kernel taint [ 870.378505] mce: [Hardware Error]: CPU 24: Machine Check Exception: 5 Bank 6: bb80000000000e0b [ 870.387110] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8e2d4847> {intel_idle+0x87/0x130} [ 870.395716] mce: [Hardware Error]: TSC 44e0f5e602c MISC d7000000 [ 870.401801] mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1589320331 SOCKET 1 APIC 40 microcode 2000064 [ 870.411185] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 870.420531] mce: [Hardware Error]: Machine check: Processor context corrupt [ 870.427488] Kernel panic - not syncing: Fatal machine check [ 870.433108] Kernel Offset: 0xc800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 871.054820] Rebooting in 30 seconds.. [ 900.901238] ACPI MEMORY or I/O RESET_REG. Copyright(c) 2015 American Megatrends, Inc. 0x19 : Pre-memory SB Initialization. Copyright(c) 2016 American Megatrends, Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1908294/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp