On 10/4/2024 1:31 PM, Yonatan Avhar wrote:
[1.] One line summary of the problem:
When suspending my Lenovo ThinkPad P14s Gen 5 (Intel) with the latest
Fedora 40 kernel, the system fails to suspend, then begins to
hang/stutter every few seconds.
[2.] Full description of the problem/report:
I recently switched to a brand new Lenovo ThinkPad P14s Gen 5 and
installed Fedora KDE 40 on it, and quickly started to encounter small
(it feels like the whole desktop locks up for ~0.5s, but I don't know
how to measure it) hangs every few seconds.
After more troubleshooting, I found that the hangs only begin after I
try to suspend the system. When I try to suspend the system, the screen
goes black, but the system fails to suspend. The screen then turns back
on, and the hangs begin. (See the attached logs for details)
Another thing I discovered while troubleshooting, the failure and
subsequent hangs do not occur when the Ethernet link is up, or when
using the older 6.8.5 kernel (I also attached logs when using it,
suspending, and waking up manually).
During each hang, the system monitor shows that a single core is pinned
to 100% utilization, I attached a screenshot showing the graphs from the
Plasma System Monitor which show this behavior. Additionally, I used
`btop` to check which process was responsible for the high CPU usage,
the processes that I found were "kworker/X:X-mm_percpu_wq",
"kworker/X:X-events", and "kworker/X:X-events_freezable_pwr_efficient".
A short recording of |btop| can be seen in the attached asciinema
recording btop.cast or at https://asciinema.org/a/qsMM9XRnIq0VjmVaqWQ2iyTuQ
I noticed errors from e1000e in the journal that seem to be the cause of
the issue, and don't show up when using the 6.8.5 kernel or when I
`rmmod e1000e` before suspending. They can be seen in
log-6.10.11-200.fc40.x86_64-justsleep.txt:3012 and in
log-6.10.11-200.fc40.x86_64-with-driver-reset.txt:3012.
The exact model number is "Lenovo ThinkPad P14s Gen 5 21G2001VUS", exact
specs from Lenovo are here:
https://psref.lenovo.com/Detail/ThinkPad_P14s_Gen_5_Intel?M=21G2001VUS
[4.] Kernel information
[4.1.] Kernel version (from /proc/version): Linux version
6.10.11-200.fc40.x86_64 (mockbuild@3ca6e723992940d59a04517d5d4c6213)
(gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version
2.41-37.fc40) #1 SMP PREEMPT_DYNAMIC Wed Sep 18 21:09:58 UTC 2024
[4.2.] Kernel .config file:
See attached files config-6.8.5-301.fc40.x86_64 and
config-6.10.11-200.fc40.x86_64
[5.] Most recent kernel version which did not have the bug:
6.8.5-301.fc40.x86_64. I did not test using kernels in between them,
simply because these are the kernel the Fedora Linux repos offer.
[7.] A small shell script or example program which triggers the
problem (if possible)
I don't have a script that triggers the problem. Instructions to
reproduce the problem are as follows:
1. Boot the system and log in
2. Once the desktop loads (I also tried switching to a TTY)
3. Run `systemctl suspend`
4. The screen goes black, but the system fails to suspend. The screen
then turns back on, and the hangs begin.
[8.] Environment
[8.1.] Software:
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
Linux Yonatan-P14s 6.10.11-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC
Wed Sep 18 21:09:58 UTC 2024 x86_64 GNU/Linux
GNU C 14
GNU Make 4.4.1
Binutils 2.41
Util-linux 2.40.1
Mount 2.40.1
Module-init-tools 31
E2fsprogs 1.47.0
Jfsutils 1.1.15
Xfsprogs 6.5.0
Quota-tools 4.09
PPP 2.5.0
Nfs-utils 2.7.1
Bison 3.8.2
Flex 2.6.4
Linux C++ Library 6.0.33
Dynamic linker (ldd) 2.39
Procps 4.0.4
Net-tools 2.10
Kbd 2.6.4
Console-tools 2.6.4
Sh-utils 9.4
Udev 255
Modules Loaded ac97_bus acpi_pad acpi_tad
acpi_thermal_rel binfmt_misc bluetooth bnep btbcm btintel btmtk btrtl
btusb cec cfg80211 coretemp crc32c_intel crc32_pclmul crct10dif_pclmul
drm_buddy drm_display_helper drm_exec drm_gpuvm drm_suballoc_helper
drm_ttm_helper e1000e fat firmware_attributes_class fuse
ghash_clmulni_intel gpu_sched hid_multitouch i2c_algo_bit i2c_dev
i2c_hid i2c_hid_acpi i2c_i801 i2c_smbus i915 idma64 igen6_edac
int3400_thermal int3403_thermal int340x_thermal_zone intel_cstate
intel_hid intel_pmc_bxt intel_pmc_core intel_powerclamp
intel_rapl_common intel_rapl_msr intel_uncore intel_uncore_frequency
intel_uncore_frequency_common intel_vpu intel_vsec ip6_tables ip_set
ip_tables iTCO_vendor_support iTCO_wdt iwlmvm iwlwifi joydev kvm
kvm_intel libarc4 loop mac80211 mc mei mei_gsc_proxy mei_me mtd
nf_conntrack nf_conntrack_broadcast nf_conntrack_netbios_ns
nf_defrag_ipv4 nf_defrag_ipv6 nf_nat nfnetlink nf_reject_ipv4
nf_reject_ipv6 nf_tables nft_chain_nat nft_ct nft_fib nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_reject nft_reject_inet nvidia nvidia_drm
nvidia_modeset nvidia_uvm nvme nvme_auth nvme_core pcspkr
pinctrl_meteorlake platform_profile pmt_class pmt_telemetry
polyval_clmulni polyval_generic processor_thermal_device
processor_thermal_device_pci processor_thermal_mbox
processor_thermal_power_floor processor_thermal_rapl
processor_thermal_rfim processor_thermal_wt_hint
processor_thermal_wt_req qrtr rapl rfcomm rfkill serio_raw sha1_ssse3
sha256_ssse3 sha512_ssse3 snd snd_compress snd_ctl_led snd_hda_codec
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_core snd_hda_ext_core snd_hda_intel snd_hda_scodec_component
snd_hrtimer snd_hwdep snd_intel_dspcfg snd_intel_sdw_acpi snd_pcm
snd_pcm_dmaengine snd_seq snd_seq_device snd_seq_dummy snd_soc_acpi
snd_soc_acpi_intel_match snd_soc_core snd_soc_dmic snd_soc_hdac_hda
snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_soc_skl_hda_dsp
snd_sof snd_sof_intel_hda snd_sof_intel_hda_common
snd_sof_intel_hda_generic snd_sof_intel_hda_mlink snd_sof_pci
snd_sof_pci_intel_mtl snd_sof_probes snd_sof_utils snd_sof_xtensa_dsp
snd_timer soundcore soundwire_bus soundwire_cadence
soundwire_generic_allocation soundwire_intel sparse_keymap spi_intel
spi_intel_pci spi_nor sunrpc think_lmi thinkpad_acpi thunderbolt ttm
typec typec_ucsi ucsi_acpi uinput uvc uvcvideo vfat video
videobuf2_common videobuf2_memops videobuf2_v4l2 videobuf2_vmalloc
videodev wmi wmi_bmof x86_pkg_temp_thermal xe zram
Processor information, module information, loaded driver and hardware
information, and PCI information are attached as files. Note that SCSI
info is not included since the laptop does not have any SCSI devices.
[X.] Other notes, patches, fixes, workarounds:
My current workaround for this is to add the following script to
/etc/systemd/system-sleep/:
#!/bin/sh
case $1/$2 in
pre/*)
echo "Unloading Intel e1000e driver"
rmmod e1000e
;;
post/*)
echo "Loading Intel e1000e driver"
modprobe e1000e
;;
esac
I would be happy to provide any additional information or test any
changes, since this seems to be specific to this laptop model and not a
universal issue. Unfortunately I only have access to this one unit, so I
can't check multiple machines with the same configuration.
Thanks in advance
Yonatan Avhar
Hi Yonatan,
Thank you for the great analysis.
Recently, we submitted a few patches that should take care of this kind
of errors.
I see that you are currently working on a custom Fedora kernel, but,
since we submitted a few patches to the upstream kernel that should fix
these kind of issues, I would like to ask you to reproduce this bug with
kernel 6.12-rc1, where the commit 0a6ad4d9e169 (e1000e: avoid failing
the system during pm_suspend).
If the issue still persists, please try to apply:
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20241001170848.1191876-1-vitaly.lifsh...@intel.com/