** Summary changed: - kernel panic after upgrading to kernel 5.13.0-23 + amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot
** Description changed: - After upgrading my son's Asus PN50 with Ubuntu 21.10 to the latest - kernel 5.13.0-23, I am no longer able to boot it normally. Kernel fails - with the panic halfway through the boot process (which got overall - suspiciously slow): + BugLink: https://bugs.launchpad.net/bugs/1956519 - [ 1.359465] BUG: kernel NULL pointer dereference, address: 000000000000000c - [ 1.359498] #PF: supervisor write access in kernel mode - [ 1.359519] #PF: error_code(0x0002) - not-present page - [ 1.359540] PGD 0 P4D 0 - [ 1.359553] Oops: 0002 [#1] SMP NOPTI - [ 1.359569] CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu - [ 1.359602] Hardware name: ASUSTeK COMPUTER INC. MINIPC PN50/PN50, BIOS 0623 05/13/2021 - [ 1.359632] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] - [ 1.359661] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d - [ 1.359729] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246 - [ 1.359750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 - [ 1.359777] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: ffffffffa680004c - [ 1.359804] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 0000000000000006 - [ 1.359831] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 0000000fffffffe0 - [ 1.359857] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: ffff992bc3387cd8 - [ 1.359884] FS: 00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) knlGS:0000000000000000 - [ 1.359915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - [ 1.359937] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 0000000000350ef0 - [ 1.359964] Call Trace: - [ 1.359976] ? __pci_set_master+0x5f/0xe0 - [ 1.359997] amd_mp2_pci_probe+0xad/0x160 [amd_sfh] - [ 1.360021] local_pci_probe+0x48/0x80 - [ 1.360038] pci_device_probe+0x105/0x1c0 - [ 1.360056] really_probe+0x24b/0x4c0 - [ 1.360073] driver_probe_device+0xf0/0x160 - [ 1.360091] device_driver_attach+0xab/0xb0 - [ 1.360110] __driver_attach+0xb2/0x140 - [ 1.360126] ? device_driver_attach+0xb0/0xb0 - [ 1.360145] bus_for_each_dev+0x7e/0xc0 - [ 1.360161] driver_attach+0x1e/0x20 - [ 1.360177] bus_add_driver+0x135/0x1f0 - [ 1.360194] driver_register+0x95/0xf0 - [ 1.360210] ? 0xffffffffc03d2000 - [ 1.360225] __pci_register_driver+0x57/0x60 - [ 1.360242] amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] - [ 1.360266] do_one_initcall+0x48/0x1d0 - [ 1.360284] ? kmem_cache_alloc_trace+0xfb/0x240 - [ 1.360306] do_init_module+0x62/0x290 - [ 1.360323] load_module+0xa8f/0xb10 - [ 1.360340] __do_sys_finit_module+0xc2/0x120 - [ 1.360359] __x64_sys_finit_module+0x18/0x20 - [ 1.360377] do_syscall_64+0x61/0xb0 - [ 1.361638] ? ksys_mmap_pgoff+0x135/0x260 - [ 1.362883] ? exit_to_user_mode_prepare+0x37/0xb0 - [ 1.364121] ? syscall_exit_to_user_mode+0x27/0x50 - [ 1.365343] ? __x64_sys_mmap+0x33/0x40 - [ 1.366550] ? do_syscall_64+0x6e/0xb0 - [ 1.367749] ? do_syscall_64+0x6e/0xb0 - [ 1.368923] ? do_syscall_64+0x6e/0xb0 - [ 1.370079] ? syscall_exit_to_user_mode+0x27/0x50 - [ 1.371227] ? do_syscall_64+0x6e/0xb0 - [ 1.372359] ? exc_page_fault+0x8f/0x170 - [ 1.373478] ? asm_exc_page_fault+0x8/0x30 - [ 1.374584] entry_SYSCALL_64_after_hwframe+0x44/0xae - [ 1.375684] RIP: 0033:0x7ff0ec73a94d - [ 1.376767] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48 - [ 1.377926] RSP: 002b:00007ffd00724ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 - [ 1.379076] RAX: ffffffffffffffda RBX: 000055e130084390 RCX: 00007ff0ec73a94d - [ 1.380225] RDX: 0000000000000000 RSI: 00007ff0ec8ca3fe RDI: 0000000000000005 - [ 1.381363] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 - [ 1.382488] R10: 0000000000000005 R11: 0000000000000246 R12: 00007ff0ec8ca3fe - [ 1.383598] R13: 000055e130083370 R14: 000055e130084480 R15: 000055e130086cb0 - [ 1.384698] Modules linked in: ahci(+) libahci i2c_piix4(+) r8169(+) amd_sfh(+) i2c_hid_acpi realtek i2c_hid xhci_pci(+) xhci_pci_renesas wmi(+) video(+) fjes(+) hid - [ 1.385841] CR2: 000000000000000c - [ 1.386955] ---[ end trace b2ebcacf74b788da ]--- - [ 1.388064] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] - [ 1.389176] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d - [ 1.390374] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246 - [ 1.391560] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 - [ 1.392338] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20 - [ 1.392763] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: ffffffffa680004c - [ 1.395162] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 0000000000000006 - [ 1.396372] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 0000000fffffffe0 - [ 1.397564] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: ffff992bc3387cd8 - [ 1.398754] FS: 00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) knlGS:0000000000000000 - [ 1.399916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - [ 1.401044] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 0000000000350ef0 + [Impact] - Previous kernel 5.13.0-22 works alright. + A regression was introduced into 5.13.0-23-generic for devices using AMD + Ryzen chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, + which are mostly Ryzen based laptops, but desktops do have the SOC + embedded as well. - ProblemType: Bug - DistroRelease: Ubuntu 21.10 - Package: linux-image-5.13.0-23-generic 5.13.0-23.23 - ProcVersionSignature: Ubuntu 5.13.0-22.22-generic 5.13.19 - Uname: Linux 5.13.0-22-generic x86_64 - NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair - ApportVersion: 2.20.11-0ubuntu71 - Architecture: amd64 - AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-id', '/dev/snd/controlC1', '/dev/snd/pcmC1D0c', '/dev/snd/controlC2', '/dev/snd/hwC2D0', '/dev/snd/pcmC2D0c', '/dev/snd/pcmC2D0p', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: - CasperMD5CheckResult: unknown - Date: Wed Jan 5 19:00:15 2022 - InstallationDate: Installed on 2021-01-01 (369 days ago) - InstallationMedia: Ubuntu 20.10 "Groovy Gorilla" - Release amd64 (20201022) - MachineType: ASUSTeK COMPUTER INC. MINIPC PN50 - ProcFB: 0 amdgpudrmfb - ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_ct91lc@/vmlinuz-5.13.0-22-generic root=ZFS=rpool/ROOT/ubuntu_ct91lc ro quiet splash - RelatedPackageVersions: - linux-restricted-modules-5.13.0-22-generic N/A - linux-backports-modules-5.13.0-22-generic N/A - linux-firmware 1.201.3 - SourcePackage: linux - UpgradeStatus: Upgraded to impish on 2021-10-17 (80 days ago) - WifiSyslog: + On early boot, when the driver initialises the device, it hits a null + pointer dereference with the following stack trace: - dmi.bios.date: 05/13/2021 - dmi.bios.release: 6.23 - dmi.bios.vendor: ASUSTeK COMPUTER INC. - dmi.bios.version: 0623 - dmi.board.asset.tag: Default string - dmi.board.name: PN50 - dmi.board.vendor: ASUSTeK COMPUTER INC. - dmi.board.version: To be filled by O.E.M. - dmi.chassis.asset.tag: Default string - dmi.chassis.type: 35 - dmi.chassis.vendor: Default string - dmi.chassis.version: Default string - dmi.modalias: dmi:bvnASUSTeKCOMPUTERINC.:bvr0623:bd05/13/2021:br6.23:svnASUSTeKCOMPUTERINC.:pnMINIPCPN50:pvr0623:rvnASUSTeKCOMPUTERINC.:rnPN50:rvrTobefilledbyO.E.M.:cvnDefaultstring:ct35:cvrDefaultstring:sku: - dmi.product.family: Vivo PC - dmi.product.name: MINIPC PN50 - dmi.product.version: 0623 - dmi.sys.vendor: ASUSTeK COMPUTER INC. + BUG: kernel NULL pointer dereference, address: 000000000000000c + #PF: supervisor write access in kernel mode + #PF: error_code(0x0002) - not-present page + PGD 0 P4D 0 + Oops: 0002 [#1] SMP NOPTI + CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu + RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] + Call Trace: + ? __pci_set_master+0x5f/0xe0 + amd_mp2_pci_probe+0xad/0x160 [amd_sfh] + local_pci_probe+0x48/0x80 + pci_device_probe+0x105/0x1c0 + really_probe+0x24b/0x4c0 + driver_probe_device+0xf0/0x160 + device_driver_attach+0xab/0xb0 + __driver_attach+0xb2/0x140 + ? device_driver_attach+0xb0/0xb0 + bus_for_each_dev+0x7e/0xc0 + driver_attach+0x1e/0x20 + bus_add_driver+0x135/0x1f0 + driver_register+0x95/0xf0 + ? 0xffffffffc03d2000 + __pci_register_driver+0x57/0x60 + amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] + do_one_initcall+0x48/0x1d0 + ? kmem_cache_alloc_trace+0xfb/0x240 + do_init_module+0x62/0x290 + load_module+0xa8f/0xb10 + __do_sys_finit_module+0xc2/0x120 + __x64_sys_finit_module+0x18/0x20 + do_syscall_64+0x61/0xb0 + ? ksys_mmap_pgoff+0x135/0x260 + ? exit_to_user_mode_prepare+0x37/0xb0 + ? syscall_exit_to_user_mode+0x27/0x50 + ? __x64_sys_mmap+0x33/0x40 + ? do_syscall_64+0x6e/0xb0 + ? do_syscall_64+0x6e/0xb0 + ? do_syscall_64+0x6e/0xb0 + ? syscall_exit_to_user_mode+0x27/0x50 + ? do_syscall_64+0x6e/0xb0 + ? exc_page_fault+0x8f/0x170 + ? asm_exc_page_fault+0x8/0x30 + entry_SYSCALL_64_after_hwframe+0x44/0xae + + This causes a panic and the system is unable to continue booting, and + the user must select an older kernel to boot. + + [Fix] + + The issue was introduced in 5.13.0-23-generic by the commit: + + commit d46ef750ed58cbeeba2d9a55c99231c30a172764 + commit-impish 56559d7910e704470ad72da58469b5588e8cbf85 + Author: Evgeny Novikov <[email protected]> + Date: Tue Jun 1 19:38:01 2021 +0300 + Subject:HID: amd_sfh: Fix potential NULL pointer dereference + Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764 + + The issue is pretty straightforward, amd_sfh_client.c attempts to + dereference cl_data, but it is NULL: + + $ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47 + 0x0000000000000767 + amd_sfh_hid_client_init + /build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27 + + 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) + 135 { + ... + 146 + 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]); + 148 + ... + + The patch moves the call to amd_sfh_hid_client_init() before + privdata->cl_data is actually allocated by devm_kzalloc, hence cl_data + being NULL. + + + rc = amd_sfh_hid_client_init(privdata); + + if (rc) + + return rc; + + + privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL); + if (!privdata->cl_data) + return -ENOMEM; + ... + - return amd_sfh_hid_client_init(privdata); + + return 0; + + The issue was fixed upstream in 5.15-rc4 by the commit: + + commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee + Author: Basavaraj Natikar <[email protected]> + Date: Thu Sep 23 17:59:27 2021 +0530 + Subject: HID: amd_sfh: Fix potential NULL pointer dereference + Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee + + The fix places the call to amd_sfh_hid_client_init() after + privdata->cl_data is allocated, and it changes the order of + amd_sfh_hid_client_init() to happen before devm_add_action_or_reset() + fixing the actual null pointer dereference which caused these commits to + exist. + + This patch also landed in 5.14.10 -stable, but it seems it was omitted + from being backported to impish, likely due to it sharing the exact same + subject line as the regression commit, so it was likely dropped as a + duplicate? + + [Testcase] + + You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID + device built in to test this. + + Simply booting the system is enough to trigger the issue. + + A test kernel is available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test + + A community user has tested the test kernel, and has confirmed that it + fixes the issue. + + [Where problems could occur] + + If a regression were to occur, it would only affect AMD Ryzen based + systems with the AMD Sensor Fusion Hub HID device SOC. Since the changes + affect the device initialisation function, a regression could cause + systems to panic during boot, forcing users to revert to older kernels + to start their systems. + + Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is + in widespread use, and is already present in Jammy. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1956519 Title: amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot Status in linux package in Ubuntu: Fix Released Status in linux source package in Impish: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1956519 [Impact] A regression was introduced into 5.13.0-23-generic for devices using AMD Ryzen chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, which are mostly Ryzen based laptops, but desktops do have the SOC embedded as well. On early boot, when the driver initialises the device, it hits a null pointer dereference with the following stack trace: BUG: kernel NULL pointer dereference, address: 000000000000000c #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] Call Trace: ? __pci_set_master+0x5f/0xe0 amd_mp2_pci_probe+0xad/0x160 [amd_sfh] local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x24b/0x4c0 driver_probe_device+0xf0/0x160 device_driver_attach+0xab/0xb0 __driver_attach+0xb2/0x140 ? device_driver_attach+0xb0/0xb0 bus_for_each_dev+0x7e/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x135/0x1f0 driver_register+0x95/0xf0 ? 0xffffffffc03d2000 __pci_register_driver+0x57/0x60 amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] do_one_initcall+0x48/0x1d0 ? kmem_cache_alloc_trace+0xfb/0x240 do_init_module+0x62/0x290 load_module+0xa8f/0xb10 __do_sys_finit_module+0xc2/0x120 __x64_sys_finit_module+0x18/0x20 do_syscall_64+0x61/0xb0 ? ksys_mmap_pgoff+0x135/0x260 ? exit_to_user_mode_prepare+0x37/0xb0 ? syscall_exit_to_user_mode+0x27/0x50 ? __x64_sys_mmap+0x33/0x40 ? do_syscall_64+0x6e/0xb0 ? do_syscall_64+0x6e/0xb0 ? do_syscall_64+0x6e/0xb0 ? syscall_exit_to_user_mode+0x27/0x50 ? do_syscall_64+0x6e/0xb0 ? exc_page_fault+0x8f/0x170 ? asm_exc_page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xae This causes a panic and the system is unable to continue booting, and the user must select an older kernel to boot. [Fix] The issue was introduced in 5.13.0-23-generic by the commit: commit d46ef750ed58cbeeba2d9a55c99231c30a172764 commit-impish 56559d7910e704470ad72da58469b5588e8cbf85 Author: Evgeny Novikov <[email protected]> Date: Tue Jun 1 19:38:01 2021 +0300 Subject:HID: amd_sfh: Fix potential NULL pointer dereference Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764 The issue is pretty straightforward, amd_sfh_client.c attempts to dereference cl_data, but it is NULL: $ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47 0x0000000000000767 amd_sfh_hid_client_init /build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) 135 { ... 146 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]); 148 ... The patch moves the call to amd_sfh_hid_client_init() before privdata->cl_data is actually allocated by devm_kzalloc, hence cl_data being NULL. + rc = amd_sfh_hid_client_init(privdata); + if (rc) + return rc; + privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL); if (!privdata->cl_data) return -ENOMEM; ... - return amd_sfh_hid_client_init(privdata); + return 0; The issue was fixed upstream in 5.15-rc4 by the commit: commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee Author: Basavaraj Natikar <[email protected]> Date: Thu Sep 23 17:59:27 2021 +0530 Subject: HID: amd_sfh: Fix potential NULL pointer dereference Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee The fix places the call to amd_sfh_hid_client_init() after privdata->cl_data is allocated, and it changes the order of amd_sfh_hid_client_init() to happen before devm_add_action_or_reset() fixing the actual null pointer dereference which caused these commits to exist. This patch also landed in 5.14.10 -stable, but it seems it was omitted from being backported to impish, likely due to it sharing the exact same subject line as the regression commit, so it was likely dropped as a duplicate? [Testcase] You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID device built in to test this. Simply booting the system is enough to trigger the issue. A test kernel is available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test A community user has tested the test kernel, and has confirmed that it fixes the issue. [Where problems could occur] If a regression were to occur, it would only affect AMD Ryzen based systems with the AMD Sensor Fusion Hub HID device SOC. Since the changes affect the device initialisation function, a regression could cause systems to panic during boot, forcing users to revert to older kernels to start their systems. Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is in widespread use, and is already present in Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956519/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

