Control: tags -1 + moreinfo On Sat, Jan 13, 2024 at 11:45:29AM +0100, Arno Lehmann wrote: > Package: src:linux > Version: 6.1.69-1 > Severity: normal > Tags: upstream > > Dear Maintainer, > > > just having the computer run for a while, the network loses connection because > the NIC detached from PCIe. I suspect this is related to power management but > am not really sure. > > As this seemed to be a known problem, I added pcie_aspm=off to the kernel > command line. > > The problem happens more or less randomly, the computer is usually running > 24/7: > > # journalctl --grep 'PCIe link lost' --quiet | cat > Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) > (uninitialized): PCIe link lost, device now detached > Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) > (uninitialized): PCIe link lost, device now detached > Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > > > This is what I find in the kernel or system log: > > Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Jan 13 11:16:31 Zwerg kernel: ------------[ cut here ]------------ > Jan 13 11:16:31 Zwerg kernel: igc: Failed to read reg 0xc030! > Jan 13 11:16:31 Zwerg kernel: WARNING: CPU: 18 PID: 6389 at > drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc] > Jan 13 11:16:31 Zwerg kernel: Modules linked in: rfcomm cpufreq_userspace > cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl rpcs> > Jan 13 11:16:31 Zwerg kernel: configfs efivarfs ip_tables x_tables autofs4 > xfs libcrc32c crc32c_generic dm_crypt dm_mod hid_generic amdgpu crc32_pc> > Jan 13 11:16:31 Zwerg kernel: CPU: 18 PID: 6389 Comm: kworker/18:1 Not > tainted 6.1.0-17-amd64 #1 Debian 6.1.69-1 > Jan 13 11:16:31 Zwerg kernel: Hardware name: ASUS System Product Name/ROG > STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023 > Jan 13 11:16:31 Zwerg kernel: Workqueue: events igc_watchdog_task [igc] > Jan 13 11:16:31 Zwerg kernel: RIP: 0010:igc_rd32+0x91/0xa0 [igc] > Jan 13 11:16:31 Zwerg kernel: Code: 48 c7 c6 d0 55 56 c0 e8 0b 7d 6c f8 48 8b > bd 28 ff ff ff e8 31 c7 23 f8 84 c0 74 b4 89 de 48 c7 c7 f8 55 56 c0 e> > Jan 13 11:16:31 Zwerg kernel: RSP: 0018:ffffac56d5f13df0 EFLAGS: 00010286 > Jan 13 11:16:31 Zwerg kernel: RAX: 0000000000000000 RBX: 000000000000c030 > RCX: 0000000000000027 > Jan 13 11:16:31 Zwerg kernel: RDX: ffffa046f85a03a8 RSI: 0000000000000001 > RDI: ffffa046f85a03a0 > Jan 13 11:16:31 Zwerg kernel: RBP: ffffa03f45710c28 R08: 0000000000000000 > R09: ffffac56d5f13c68 > Jan 13 11:16:31 Zwerg kernel: R10: 0000000000000003 R11: ffffa04717f7ffe8 > R12: ffffa03f45710000 > Jan 13 11:16:31 Zwerg kernel: R13: 0000000000000000 R14: ffffa03f456efd40 > R15: 000000000000c030 > Jan 13 11:16:31 Zwerg kernel: FS: 0000000000000000(0000) > GS:ffffa046f8580000(0000) knlGS:0000000000000000 > Jan 13 11:16:31 Zwerg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jan 13 11:16:31 Zwerg kernel: CR2: 00007f1fc894f000 CR3: 00000008a8538000 > CR4: 0000000000750ee0 > Jan 13 11:16:31 Zwerg kernel: PKRU: 55555554 > Jan 13 11:16:31 Zwerg kernel: Call Trace: > Jan 13 11:16:31 Zwerg kernel: <TASK> > > > Obviously, the kernel parameter to disable PCIe power management was not > solving this problem. > > The way to recover is to restart the computer.
Just to be clear, can you confirm this is or is not a regression from a previous running 6.1.y kernel? I'm asking because I suspect that this similar to https://lore.kernel.org/intel-wired-lan/20221031170535.77be0...@kernel.org/ and did not ever worked reliably with your hardware? Regards, Salvatore