Hi all,
newest developments:
some time shortly after leaving this computer yesterday evening, (at
least) the NVMe storage disappeared for the kernel. Console showed
messages about inaccesible files from journald all the time.
Reset using the hardware switch resulted in the UEFI interface, as the
firmware could find no storage. Power cycling fixed that.
Obviously, there is no useful log available.
I rebooted into the recent kernel, but that lost the network nearly
immediately. rmmod / modprobe took a few seconds, then also nearly
immediately the same:
# journalctl -b --grep 'PCIe link lost' --quiet | cat
Jan 27 09:44:53 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost,
device now detached
Jan 27 09:48:05 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
Looks like I can repeat that as much as I like now:
[Sa Jan 27 09:48:45 2024] igc: probe of 0000:0a:00.0 failed with error -13
[Sa Jan 27 09:52:15 2024] Intel(R) 2.5G Ethernet Linux Driver
[Sa Jan 27 09:52:15 2024] Copyright(c) 2018 Intel Corporation.
[Sa Jan 27 09:52:15 2024] igc 0000:0a:00.0: PCIe PTM not supported by
PCIe bus/controller
[Sa Jan 27 09:52:15 2024] igc 0000:0a:00.0 (unnamed net_device)
(uninitialized): PCIe link lost, device now detached
[Sa Jan 27 09:52:15 2024] ------------[ cut here ]------------
[Sa Jan 27 09:52:15 2024] igc: Failed to read reg 0x10!
[Sa Jan 27 09:52:15 2024] WARNING: CPU: 19 PID: 4334 at
drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] Modules linked in: igc(+) rfcomm
cpufreq_userspace cpufreq_powersave cpufreq_ondemand
cpufreq_conservative nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs qrtr overlay cmac algif_hash
algif_skcipher af_alg bnep sunrpc binfmt_misc nls_ascii nls_cp437 vfat
fat ext4 mbcache jbd2 btusb btrtl btbcm btintel btmtk bluetooth
jitterentropy_rng intel_rapl_msr intel_rapl_common uvcvideo edac_mce_amd
videobuf2_vmalloc drbg snd_hda_codec_hdmi videobuf2_memops ansi_cprng
videobuf2_v4l2 snd_usb_audio snd_hda_intel kvm_amd eeepc_wmi asus_nb_wmi
videobuf2_common ecdh_generic snd_intel_dspcfg asus_wmi ecc
snd_usbmidi_lib snd_intel_sdw_acpi crc16 snd_rawmidi battery videodev
snd_seq_device snd_hda_codec platform_profile kvm snd_hda_core
sparse_keymap snd_hwdep mc ledtrig_audio irqbypass snd_pcm rfkill rapl
snd_timer sp5100_tco wmi_bmof ccp snd k10temp watchdog pcspkr soundcore
joydev sg acpi_cpufreq evdev msr parport_pc ppdev lp parport fuse loop
[Sa Jan 27 09:52:15 2024] efi_pstore configfs efivarfs ip_tables
x_tables autofs4 xfs libcrc32c crc32c_generic dm_crypt dm_mod
hid_generic amdgpu usbhid crc32_pclmul hid crc32c_intel sr_mod gpu_sched
cdrom drm_buddy i2c_algo_bit ghash_clmulni_intel drm_display_helper
sha512_ssse3 cec sha512_generic rc_core drm_ttm_helper sha256_ssse3 ttm
ahci sha1_ssse3 libahci xhci_pci drm_kms_helper nvme xhci_hcd libata
nvme_core drm aesni_intel usbcore t10_pi scsi_mod crc64_rocksoft_generic
crypto_simd crc64_rocksoft cryptd crc_t10dif crct10dif_generic
crct10dif_pclmul i2c_piix4 crc64 crct10dif_common scsi_common usb_common
video wmi gpio_amdpt gpio_generic button [last unloaded: igc]
[Sa Jan 27 09:52:15 2024] CPU: 19 PID: 4334 Comm: modprobe Tainted: G
W 6.1.0-17-amd64 #1 Debian 6.1.69-1
[Sa Jan 27 09:52:15 2024] Hardware name: ASUS System Product Name/ROG
STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
[Sa Jan 27 09:52:15 2024] RIP: 0010:igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] Code: 48 c7 c6 d0 c5 83 c0 e8 0b 0d 9f ca 48
8b bd 28 ff ff ff e8 31 57 56 ca 84 c0 74 b4 89 de 48 c7 c7 f8 c5 83 c0
e8 df 08 07 ca <0f> 0b eb a2 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
00 00 41 56
[Sa Jan 27 09:52:15 2024] RSP: 0018:ffffb41e9535bbc8 EFLAGS: 00010282
[Sa Jan 27 09:52:15 2024] RAX: 0000000000000000 RBX: 0000000000000010
RCX: 0000000000000027
[Sa Jan 27 09:52:15 2024] RDX: ffff9b1e785e03a8 RSI: 0000000000000001
RDI: ffff9b1e785e03a0
[Sa Jan 27 09:52:15 2024] RBP: ffff9b1714610c28 R08: 0000000000000000
R09: ffffb41e9535ba40
[Sa Jan 27 09:52:15 2024] R10: 0000000000000003 R11: ffff9b1e97f7ffe8
R12: ffff9b1714610000
[Sa Jan 27 09:52:15 2024] R13: ffff9b1714610980 R14: ffff9b1714610000
R15: ffff9b1714610c28
[Sa Jan 27 09:52:15 2024] FS: 00007fe52dcb3040(0000)
GS:ffff9b1e785c0000(0000) knlGS:0000000000000000
[Sa Jan 27 09:52:15 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sa Jan 27 09:52:15 2024] CR2: 00007fe52d5af1f4 CR3: 000000011d41e000
CR4: 0000000000750ee0
[Sa Jan 27 09:52:15 2024] PKRU: 55555554
[Sa Jan 27 09:52:15 2024] Call Trace:
[Sa Jan 27 09:52:15 2024] <TASK>
[Sa Jan 27 09:52:15 2024] ? __warn+0x7d/0xc0
[Sa Jan 27 09:52:15 2024] ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] ? report_bug+0xe2/0x150
[Sa Jan 27 09:52:15 2024] ? handle_bug+0x41/0x70
[Sa Jan 27 09:52:15 2024] ? exc_invalid_op+0x13/0x60
[Sa Jan 27 09:52:15 2024] ? asm_exc_invalid_op+0x16/0x20
[Sa Jan 27 09:52:15 2024] ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] ? igc_rd32+0x91/0xa0 [igc]
[Sa Jan 27 09:52:15 2024] igc_get_invariants_base+0xb5/0x260 [igc]
[Sa Jan 27 09:52:15 2024] igc_probe+0x2b9/0x8d0 [igc]
[Sa Jan 27 09:52:15 2024] local_pci_probe+0x41/0x80
[Sa Jan 27 09:52:15 2024] pci_device_probe+0xc3/0x240
[Sa Jan 27 09:52:15 2024] really_probe+0xde/0x380
[Sa Jan 27 09:52:15 2024] ? pm_runtime_barrier+0x50/0x90
[Sa Jan 27 09:52:15 2024] __driver_probe_device+0x78/0x120
[Sa Jan 27 09:52:15 2024] driver_probe_device+0x1f/0x90
[Sa Jan 27 09:52:15 2024] __driver_attach+0xce/0x1c0
[Sa Jan 27 09:52:15 2024] ? __device_attach_driver+0x110/0x110
[Sa Jan 27 09:52:15 2024] bus_for_each_dev+0x87/0xd0
[Sa Jan 27 09:52:15 2024] bus_add_driver+0x1ae/0x200
[Sa Jan 27 09:52:15 2024] driver_register+0x89/0xe0
[Sa Jan 27 09:52:15 2024] ? 0xffffffffc1174000
[Sa Jan 27 09:52:15 2024] do_one_initcall+0x59/0x220
[Sa Jan 27 09:52:15 2024] do_init_module+0x4a/0x1f0
[Sa Jan 27 09:52:15 2024] __do_sys_finit_module+0xac/0x120
[Sa Jan 27 09:52:15 2024] do_syscall_64+0x5b/0xc0
[Sa Jan 27 09:52:15 2024] ? do_syscall_64+0x67/0xc0
[Sa Jan 27 09:52:15 2024] entry_SYSCALL_64_after_hwframe+0x64/0xce
[Sa Jan 27 09:52:15 2024] RIP: 0033:0x7fe52d720559
[Sa Jan 27 09:52:15 2024] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00
00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 8 4c 8b
4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 08 0d 00 f7 d8
64 89 01 48
[Sa Jan 27 09:52:15 2024] RSP: 002b:00007ffd885b6ab8 EFLAGS: 00000246
ORIG_RAX: 0000000000000139
[Sa Jan 27 09:52:15 2024] RAX: ffffffffffffffda RBX: 0000559d3674ec30
RCX: 00007fe52d720559
[Sa Jan 27 09:52:15 2024] RDX: 0000000000000000 RSI: 0000559d35c644a0
RDI: 0000000000000003
[Sa Jan 27 09:52:15 2024] RBP: 0000559d35c644a0 R08: 0000000000000000
R09: 0000559d367513f0
[Sa Jan 27 09:52:15 2024] R10: 0000000000000003 R11: 0000000000000246
R12: 0000000000040000
[Sa Jan 27 09:52:15 2024] R13: 0000000000000000 R14: 0000559d3674edc0
R15: 0000000000000000
[Sa Jan 27 09:52:15 2024] </TASK>
[Sa Jan 27 09:52:15 2024] ---[ end trace 0000000000000000 ]---
I'll now reboot into the old kernel and see if I can send this message
then :-)
...
And:
# uname -a
Linux Zwerg 6.1.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.4-1
(2023-01-07) x86_64 GNU/Linux
so far... ok, I'll give this kernel another try, but next round will
then be a backported ner-bleeding-edge one, I guess.
Cheers,
Arno
--
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück