Hi Christian, thank you for filing this and for helping to improve
Ubuntu.

I have posed a question internally about your query. I will respond here
and/or assign your issue accordingly once I have more info.

Best,
Tim

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2100280

Title:
  invalid opcode and "Microcode SW error detected" in iwlwifi during
  extended WiFi stress test may trigger oom

Status in linux-hwe-6.8 package in Ubuntu:
  New

Bug description:
  Hello there,

  I am not an experienced bug reporter but I would like to share this problem 
that we keep experiencing
  when stress testing the Intel BE200 WiFi card. We conduct 24 hour stress 
tests with WiFi 7 enabled on
  the AP (speeds range from 200 to 1700 mbit/s depending on the environment) to 
assess the cards reliability.

  In these tests, we frequently experience a problem where rapid allocation of 
all free memory by the
  kernel (skbuff_small_head) triggers the oom killer and kills a random 
innocent userspace process. See
  this log line from the oom killer:

  2025-02-06 14:53:13.500 skbuff_small_head   22388173KB   22388173KB

  Our system has 32 GB of RAM and most of it is free (according to our 
monitoring) until
  20 seconds before the oom event. The bug occurred 

  In the cases we analyzed, a few minutes before the oom event, iwlwifi logs 
what seems to be quite
  fatal errors. In some instances, it logs:

  2025-02-06 14:44:49.639 iwlwifi 0000:05:00.0: Microcode SW error
  detected. Restarting 0x0.

  in other instances, we see asm_exc_invalid_op:

  2025-02-06 14:53:20.538        ? report_bug+0x17e/0x1b0
  2025-02-06 14:53:20.539        ? handle_bug+0x46/0x90
  2025-02-06 14:53:20.540        ? exc_invalid_op+0x18/0x80
  2025-02-06 14:53:20.540        ? asm_exc_invalid_op+0x1b/0x20
  2025-02-06 14:53:20.540        ? iwl_mvm_tx_tso_segment+0x372/0x390 [iwlmvm]
  2025-02-06 14:53:20.540        iwl_mvm_tx_tso.constprop.0+0x2ce/0x330 [iwlmvm]
  2025-02-06 14:53:20.540        iwl_mvm_tx_skb_sta+0x11e/0x2d0 [iwlmvm]
  2025-02-06 14:53:20.541        iwl_mvm_tx_skb+0x1c/0x60 [iwlmvm]
  ...

  in a logged call trace. Now, we cannot prove causation but the correlation is 
very strong.
  The oom does not always happen after these iwlwifi logs but when oom happens, 
it is shortly
  after such logs (a few minutes).
  I will attach journal logs of the events to this bug report.
  We also tested with the Intel AX210 card and could not reproduce the problem.

  We are currently testing the BE200 card with a backported iwlwifi and newer 
firmware from:
  https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git @ 
1a1470d90de2a25e5befadb2f1fa30758af682ca
  https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git 
@ e35111fbbe0b932054f73c7e95b8a4db2697d265
  and the problems seem to disappear. That being said, we are still actively 
testing.
  The backported driver loads the gl-c0-fm-c0-96.ucode firmware on our hardware.

  Would you consider including a newer driver/firmware in the HWE stack?

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-modules-6.8.0-52-generic 6.8.0-52.53~22.04.1
  ProcVersionSignature: Ubuntu 6.8.0-52.53~22.04.1-generic 6.8.12
  Uname: Linux 6.8.0-52-generic x86_64
  ApportVersion: 2.20.11-0ubuntu82.6
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Wed Feb 26 10:38:40 2025
  Dependencies:
   
  InstallationDate: Installed on 2022-04-21 (1041 days ago)
  InstallationMedia: Ubuntu-Server 21.10 "Impish Indri" - Release amd64 
(20211013)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=de_DE.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-hwe-6.8
  UpgradeStatus: Upgraded to jammy on 2022-12-01 (818 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.8/+bug/2100280/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to