Hi Thimo, As promised yesterday, the new re-spins of the test kernels have finished building and are now available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test The patches used are the ones I will be submitting for SRU, and are more or less identical to the patches in the previous test kernels I supplied in February. Please go ahead and do some testing, and let me know if you find any problems. Please note this package is NOT SUPPORTED by Canonical, and is for TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment. Instructions to install: 1) sudo add-apt-repository ppa:mruffell/lp1896578-test 2) sudo apt update For 21.04 / Hirsute: 3) sudo apt install linux-image-unsigned-5.11.0-16-generic linux-modules-5.11.0-16-generic \ linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic For 20.10 / Groovy: 3) sudo apt install linux-image-unsigned-5.8.0-50-generic linux-modules-5.8.0-50-generic \ linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic For 20.04 / Focal: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For 18.04 / Bionic: For the 5.4 Bionic HWE kernel: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For the 4.15 Bionic GA kernel: 3) sudo apt install linux-image-unsigned-4.15.0-142-generic linux-modules-4.15.0-142-generic \ linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic 4) sudo reboot 5) uname -rv Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv. You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/ I'm still doing final regression testing, but things are looking okay so far. The deadline for patch submission to the next SRU cycle is tomorrow. I'm still planning on submitting the patches for tomorrow, but if I think we need more time for testing, worst case it will slip to the SRU cycle after, which is 3 weeks away. I will write back tomorrow with the results of my regression testing and if I have submitted the patches for SRU. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system Status in linux package in Ubuntu: Fix Released Status in linux source package in Trusty: Invalid Status in linux source package in Xenial: Invalid Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Released Bug description: Seems to be closely related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578 After updating the Ubuntu 18.04 kernel from 4.15.0-124 to 4.15.0-126 the fstrim command triggered by fstrim.timer causes a severe number of mismatches between two RAID10 component devices. This bug affects several machines in our company with different HW configurations (All using ECC RAM). Both, NVMe and SATA SSDs are affected. How to reproduce: - Create a RAID10 LVM and filesystem on two SSDs mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2 pvcreate -ff -y /dev/md0 vgcreate -f -y VolGroup /dev/md0 lvcreate -n root -L 100G -ay -y VolGroup mkfs.ext4 /dev/VolGroup/root mount /dev/VolGroup/root /mnt - Write some data, sync and delete it dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M sync rm /mnt/data.raw - Check the RAID device echo check >/sys/block/md0/md/sync_action - After finishing (see /proc/mdstat), check the mismatch_cnt (should be 0): cat /sys/block/md0/md/mismatch_cnt - Trigger the bug fstrim /mnt - Re-Check the RAID device echo check >/sys/block/md0/md/sync_action - After finishing (see /proc/mdstat), check the mismatch_cnt (probably in the range of N*10000): cat /sys/block/md0/md/mismatch_cnt After investigating this issue on several machines it *seems* that the first drive does the trim correctly while the second one goes wild. At least the number and severity of errors found by a USB stick live session fsck.ext4 suggests this. To perform the single drive evaluation the RAID10 was started using a single drive at once: mdadm --assemble /dev/md127 /dev/nvme0n1p2 mdadm --run /dev/md127 fsck.ext4 -n -f /dev/VolGroup/root vgchange -a n /dev/VolGroup mdadm --stop /dev/md127 mdadm --assemble /dev/md127 /dev/nvme1n1p2 mdadm --run /dev/md127 fsck.ext4 -n -f /dev/VolGroup/root When starting these fscks without -n, on the first device it seems the directory structure is OK while on the second device there is only the lost+found folder left. Side-note: Another machine using HWE kernel 5.4.0-56 (after using -53 before) seems to have a quite similar issue. Unfortunately the risk/regression assessment in the aforementioned bug is not complete: the workaround only mitigates the issues during FS creation. This bug on the other hand is triggered by a weekly service (fstrim) causing severe file system corruption. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

