Hi Christian, thank you for filing this and for helping to improve Ubuntu. I have posed a question internally about your query. I will respond here and/or assign your issue accordingly once I have more info.
Best, Tim -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.8 in Ubuntu. https://bugs.launchpad.net/bugs/2100280 Title: invalid opcode and "Microcode SW error detected" in iwlwifi during extended WiFi stress test may trigger oom Status in linux-hwe-6.8 package in Ubuntu: New Bug description: Hello there, I am not an experienced bug reporter but I would like to share this problem that we keep experiencing when stress testing the Intel BE200 WiFi card. We conduct 24 hour stress tests with WiFi 7 enabled on the AP (speeds range from 200 to 1700 mbit/s depending on the environment) to assess the cards reliability. In these tests, we frequently experience a problem where rapid allocation of all free memory by the kernel (skbuff_small_head) triggers the oom killer and kills a random innocent userspace process. See this log line from the oom killer: 2025-02-06 14:53:13.500 skbuff_small_head 22388173KB 22388173KB Our system has 32 GB of RAM and most of it is free (according to our monitoring) until 20 seconds before the oom event. The bug occurred In the cases we analyzed, a few minutes before the oom event, iwlwifi logs what seems to be quite fatal errors. In some instances, it logs: 2025-02-06 14:44:49.639 iwlwifi 0000:05:00.0: Microcode SW error detected. Restarting 0x0. in other instances, we see asm_exc_invalid_op: 2025-02-06 14:53:20.538 ? report_bug+0x17e/0x1b0 2025-02-06 14:53:20.539 ? handle_bug+0x46/0x90 2025-02-06 14:53:20.540 ? exc_invalid_op+0x18/0x80 2025-02-06 14:53:20.540 ? asm_exc_invalid_op+0x1b/0x20 2025-02-06 14:53:20.540 ? iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm] 2025-02-06 14:53:20.540 iwl_mvm_tx_tso.constprop.0+0x2ce/0x330 [iwlmvm] 2025-02-06 14:53:20.540 iwl_mvm_tx_skb_sta+0x11e/0x2d0 [iwlmvm] 2025-02-06 14:53:20.541 iwl_mvm_tx_skb+0x1c/0x60 [iwlmvm] ... in a logged call trace. Now, we cannot prove causation but the correlation is very strong. The oom does not always happen after these iwlwifi logs but when oom happens, it is shortly after such logs (a few minutes). I will attach journal logs of the events to this bug report. We also tested with the Intel AX210 card and could not reproduce the problem. We are currently testing the BE200 card with a backported iwlwifi and newer firmware from: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git @ 1a1470d90de2a25e5befadb2f1fa30758af682ca https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git @ e35111fbbe0b932054f73c7e95b8a4db2697d265 and the problems seem to disappear. That being said, we are still actively testing. The backported driver loads the gl-c0-fm-c0-96.ucode firmware on our hardware. Would you consider including a newer driver/firmware in the HWE stack? ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-modules-6.8.0-52-generic 6.8.0-52.53~22.04.1 ProcVersionSignature: Ubuntu 6.8.0-52.53~22.04.1-generic 6.8.12 Uname: Linux 6.8.0-52-generic x86_64 ApportVersion: 2.20.11-0ubuntu82.6 Architecture: amd64 CasperMD5CheckResult: pass Date: Wed Feb 26 10:38:40 2025 Dependencies: InstallationDate: Installed on 2022-04-21 (1041 days ago) InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Release amd64 (20211013) ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=de_DE.UTF-8 SHELL=/bin/bash SourcePackage: linux-hwe-6.8 UpgradeStatus: Upgraded to jammy on 2022-12-01 (818 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.8/+bug/2100280/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp