Control: reassign -1 kernel Hi,
Although as you said, it may be something else, as it's the kernel logging this, I reassign to the kernel package, in the hope they are more able to pinpoint where the problem lies. Paul On 24-08-2021 10:53, hox...@noramail.jp wrote: > Package: upgrade-reports > Severity: normal > > Dear maintainers, > > I've upgraded one of my buster 10.10 into bullseye 11. > > * Upgrading was mostly successfull. > > * As "buster" GNOME desktop it had almost no problem, > and upgraded "bullseye" GNOME session seems fine for now. > > However, after rebooting the issue below stared on ttys. > > The machine has one NVMe and one SATA SSD (separated /home). > The Entire system installed into LUKS2 encrypted LVM volumes. > Mostly "main" packages except Intel microcode ("non-free"). > > NVMe continual syslog messsage on terminal > ========================================== > > 1. kernel reports nvme RxErr related messages. > 2. on any tty (even in rescue.target). > 3. almost always (I/O access on NVMe seems to be a trigger). > > AFAIK, the reported device (WD BLACK NVMe) works fine > and "nvme smartlog" have not reported any error in years. > > After bullseye upgrading kernel started this reporting on ttys. > > At the same time, "smartd" seems to have a minor systemd unit problem, > and it also seems to start tracking NVMe just like below syslog sample > (at the smartd starting log section). > "smartd" in buster only monitored SATA SSD. > > I can not tell what actual package and/or whether that device caused this, > posting this bug report to "upgrade-reports". > > Actual syslog (partially modified; DATETIME/HOSTNAME/SERIAL_NUMBER) > --------------------------------------------------------------------- > > 1. Cold booted. > 2. systemctl shows no error. > 3. target was shifted during this log sampling (graphical, rescue, multiuser, > graphical). > > syslog and my comments follow: > > Aug 23 19:50:27 hostname kernel: [ 0.289634] pci 0000:01:00.0: [15b7:5002] > type 00 class 0x010802 > Aug 23 19:50:27 hostname kernel: [ 0.289657] pci 0000:01:00.0: reg 0x10: > [mem 0xdf000000-0xdf003fff 64bit] > Aug 23 19:50:27 hostname kernel: [ 0.289691] pci 0000:01:00.0: reg 0x20: > [mem 0xdf004000-0xdf0040ff 64bit] > Aug 23 19:50:27 hostname kernel: [ 0.718254] pci 0000:01:00.0: Adding to > iommu group 11 > Aug 23 19:50:27 hostname kernel: [ 1.020213] nvme nvme0: pci function > 0000:01:00.0 > Aug 23 19:50:27 hostname kernel: [ 1.029047] nvme nvme0: 4/0/0 > default/read/poll queues > Aug 23 19:50:27 hostname kernel: [ 1.031074] nvme0n1: p1 p2 p3 > > This NVMe contains EFI, /boot, and LUKS root filesystem except /home (in SATA > SSD). > > Aug 23 19:50:27 hostname kernel: [ 15.626412] input: HDA Intel PCH > HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input18 > Aug 23 19:50:27 hostname kernel: [ 15.626483] input: HDA Intel PCH > HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input19 > Aug 23 19:50:27 hostname kernel: [ 15.626548] input: HDA Intel PCH > HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input20 > > The first NVMe RxErr reporting starts after the kernel dmesg (after sound > subsystem). > > Aug 23 19:50:27 hostname kernel: [ 17.365057] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:50:27 hostname kernel: [ 17.365089] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:50:27 hostname kernel: [ 17.365117] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:50:27 hostname kernel: [ 17.365142] nvme 0000:01:00.0: [ 0] > RxErr > > Then, on any tty (regardless the login status), > it keep showing that messages like this. > DATETIME was modified but kernel timing is the real log. > > Aug 23 19:51:16 hostname kernel: [ 70.416916] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:16 hostname kernel: [ 70.417029] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:16 hostname kernel: [ 70.417145] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:16 hostname kernel: [ 70.417268] nvme 0000:01:00.0: [ 0] > RxErr > Aug 23 19:51:17 hostname kernel: [ 71.693968] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:17 hostname kernel: [ 71.699967] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:17 hostname kernel: [ 71.701511] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:17 hostname kernel: [ 71.703040] nvme 0000:01:00.0: [ 0] > RxErr > Aug 23 19:51:19 hostname kernel: [ 73.739331] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:19 hostname kernel: [ 73.745023] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:19 hostname kernel: [ 73.746547] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:19 hostname kernel: [ 73.748053] nvme 0000:01:00.0: [ 0] > RxErr > Aug 23 19:51:20 hostname kernel: [ 74.506475] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:20 hostname kernel: [ 74.512092] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:20 hostname kernel: [ 74.513594] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:20 hostname kernel: [ 74.515037] nvme 0000:01:00.0: [ 0] > RxErr > Aug 23 19:51:22 hostname kernel: [ 76.552810] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:22 hostname kernel: [ 76.558081] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:22 hostname kernel: [ 76.559548] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:22 hostname kernel: [ 76.560989] nvme 0000:01:00.0: [ 0] > RxErr > Aug 23 19:51:24 hostname kernel: [ 78.599100] pcieport 0000:00:1b.0: AER: > Corrected error received: 0000:01:00.0 > Aug 23 19:51:24 hostname kernel: [ 78.604922] nvme 0000:01:00.0: PCIe Bus > Error: severity=Corrected, type=Physical Layer, (Receiver ID) > Aug 23 19:51:24 hostname kernel: [ 78.606389] nvme 0000:01:00.0: device > [15b7:5002] error status/mask=00000001/0000e000 > Aug 23 19:51:24 hostname kernel: [ 78.607832] nvme 0000:01:00.0: [ 0] > RxErr > > then smartd initialization follows, > > Aug 24 11:34:33 hostname smartd[1026]: smartd 6.6 2017-11-05 r4594 > [x86_64-linux-4.19.0-17-amd64] (local build) > Aug 24 11:34:33 hostname smartd[1026]: Copyright (C) 2002-17, Bruce Allen, > Christian Franke, www.smartmontools.org > Aug 24 11:34:33 hostname smartd[1026]: Opened configuration file > /etc/smartd.conf > Aug 24 11:34:33 hostname smartd[1026]: Drive: DEVICESCAN, implied '-a' > Directive on line 21 of file /etc/smartd.conf > Aug 24 11:34:33 hostname smartd[1026]: Configuration file /etc/smartd.conf > was parsed, found DEVICESCAN, scanning devices > Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda, type changed from > 'scsi' to 'sat' > Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda [SAT], opened > > (minor issue) smartd.service (please see the log). > > Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: > Standard output type syslog is obsolete, automatically updating to journal. > Please update your unit file, and consider removing the setting altogether. > Aug 23 19:23:18 hostname systemd[1]: Reloading. > Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: > Standard output type syslog is obsolete, automatically updating to journal. > Please update your unit file, and consider removing the setting altogether. > Aug 23 19:23:18 hostname systemd[1]: Starting Network Time Synchronization... > Aug 23 19:23:18 hostname systemd[1]: Started Network Time Synchronization. > Aug 23 19:23:18 hostname systemd[1]: Reached target System Time Set. > > FYI: buster smartd did not track NVMe device, only SATA SSD. > bullseye smartd shows it monitor both. > This might be a potential cause. > (info: S/N: section is not the real one) > > Note that smartd 7.2 shows 4.19 kernel version, > but this session is booted with bullseye 5.10 > (4.19 image was purged, and "apt autoremove --purge" had done). > > Aug 23 19:26:12 hostname smartd[28840]: smartd 7.2 2020-12-30 r5155 > [x86_64-linux-4.19.0-17-amd64] (local build) > Aug 23 19:26:12 hostname smartd[28840]: Copyright (C) 2002-20, Bruce Allen, > Christian Franke, www.smartmontools.org > Aug 23 19:26:12 hostname smartd[28840]: Opened configuration file > /etc/smartd.conf > Aug 23 19:26:12 hostname smartd[28840]: Drive: DEVICESCAN, implied '-a' > Directive on line 21 of file /etc/smartd.conf > Aug 23 19:26:12 hostname smartd[28840]: Configuration file /etc/smartd.conf > was parsed, found DEVICESCAN, scanning devices > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda, type changed from > 'scsi' to 'sat' > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], opened > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], > CT500MX500SSD1, S/N:SDASN_HEX, WWN:5-00a075-1e1e6e854, FW:M3CR023, 500 GB > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], found in > smartd database: Crucial/Micron Client SSDs > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], is SMART > capable. Adding to "monitor" list. > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state read > from /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, opened > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, > WDS250G3X0C-00SJG0, S/N:NVME0SN_HEX, FW:102000WD, 250 GB > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, is SMART capable. > Adding to "monitor" list. > Aug 23 19:26:12 hostname smartd[28840]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and > 1 NVMe devices > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state written > to /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state > Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, state written to > /var/lib/smartmontools/smartd.WDS250G3X0C_00SJG0-NVME0SN_HEX.nvme.state > Aug 23 19:26:12 hostname systemd[1]: Started Self Monitoring and Reporting > Technology (SMART) Daemon. > Aug 23 19:26:15 hostname systemd[1]: Reloading. > > "lspci" is here (as bullseye output). > > I'm not sure about 01:00.1 NVMe product name is correct. > Probably this one is WD Black NVMe 2nd gen. or 3rd gen. > > 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core > Processor Host Bridge/DRAM Registers [8086:591f] (rev 05) > 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 > [8086:5912] (rev 04) > 00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset > Family USB 3.0 xHCI Controller [8086:a2af] > 00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH > CSME HECI #1 [8086:a2ba] > 00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA > controller [AHCI mode] [8086:a282] > 00:1b.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root > Port #17 [8086:a2e7] (rev f0) > 00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root > Port #1 [8086:a290] (rev f0) > 00:1c.6 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root > Port #7 [8086:a296] (rev f0) > 00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root > Port #9 [8086:a298] (rev f0) > 00:1f.0 ISA bridge [0601]: Intel Corporation 200 Series PCH LPC Controller > (Z270) [8086:a2c5] > 00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset > Family Power Management Controller [8086:a2a1] > 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio > [8086:a2f0] > 00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus > Controller [8086:a2a3] > 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) > I219-V [8086:15b8] > 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black > 2018/SN750 / PC SN720 NVMe SSD [15b7:5002] > > Please tell me if there are something I can do for this. > > Thanks for all maintainers. > > Regards, >
OpenPGP_signature
Description: OpenPGP digital signature