** Description changed: - I have a server that has been running its data volume using ZFS in 20.04 - without any problem. The volume is using ZFS encryption and a raidz1-0 - configuration. I performed a scrub operations before the upgrade and it - did not find any problem. After the reboot for the upgrade, I was - welcomed with the following message: + [ Impact ] + Upgrading from 20.04 to 22.04 causes encrypted pools to become unmountable. This + is due to broken accounting metadata causing checksum errors on decrypt, which + makes ZFS error out early with ECKSUM. + + [ Test Plan ] + This issue needs specific accounting metadata on the zpool to be broken, and as + such is somewhat tricky to reproduce organically. A regular test plan for an + affected pool should be: + 1. Setup encrypted zpool under 20.04 + 2. Upgrade system to 22.04 (e.g. using do-release-upgrade script) + 3. Verify that zpool fails to mount under 22.04 (zpool status will likely point + to ZFS-8000-8A "Corrupted data" [0]) + + [0] https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/ + + Thankfully, upstream has included a test scenario for this under the ZFS test + suite, which is ran during build. The + tests/zfs-tests/tests/functional/userquota/13709_reproducer.bz2 file is taken + directly from upstream, and corresponds to an encrypted zpool with the required + (broken) metadata to reproduce this issue. If the ZFS test suite passes, this + should give us a strong signal that this isssue is fixed. + + [ Where problems could occur ] + Although I've backported the upstream test, it'd be great to have confirmation + from affected users that this patch resolves the issue. Additionally, we should + also perform upgrades in non-affected zpools as well as non-encrypted zpools, to + ensure no regressions have been introduced. + + Considering this change affects the encrypt/decrypt code paths, problems could + arise in creating new encrypted zpools, as well as when mounting zpools that + have been previously encrypted. + + [ Other Info ] + This SRU includes a little more changes than the minimal changes mentioned in + the SRU policy, as I've also backported one of upstream's tests for encrypted + pools. This included a new test script (userspace_encrypted_13709.ksh), as well + as a binary zpool dump (13709_reproducer.bz2) that I've added under + d/s/include-binaries. + + Considering this issue causes zpools to become unmountable, I think it's worth + to include these in the standard ZFS test suite (similar to an autopkgtest + scenario for a high-risk regression). These are included in future releases of + zfs-linux, and as such only Jammy is affected by this regression. + -- + + [ Original Description ] + I have a server that has been running its data volume using ZFS in 20.04 without any problem. The volume is using ZFS encryption and a raidz1-0 configuration. I performed a scrub operations before the upgrade and it did not find any problem. After the reboot for the upgrade, I was welcomed with the following message: status: One or more devices has experienced an error resulting in data - corruption. Applications may be affected. + corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the - entire pool from backup. - see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A + entire pool from backup. + see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A The volumes still do not have any checksum error but there are 5 zvols that are not accessible. zpool status displays a line similar to the below for each of the five: - errors: Permanent errors have been detected in the following files: - - tank/data/data:<0x0> + errors: Permanent errors have been detected in the following files: + + tank/data/data:<0x0> I run a scrub and it has not identified any problem but the error messages are not there and the data is still not available. There are 10+ other zvols in the zpool that do not have any kind of problem. I have been unable to identify any correlation between the zvols that are failing. I have seen people reporting similar problems in github after the 20.04 to the 22.04 upgrade (see https://github.com/openzfs/zfs/issues/13763). I wonder how widespread the problem will be as more people upgrades to 22.04. I will try to downgrade the version of zfs in the system and report back ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: zfsutils-linux 2.1.4-0ubuntu0.1 ProcVersionSignature: Ubuntu 5.15.0-46.49-generic 5.15.39 Uname: Linux 5.15.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.1 Architecture: amd64 CasperMD5CheckResult: unknown Date: Sat Aug 20 22:24:54 2022 ProcEnviron: - TERM=screen-256color - PATH=(custom, no user) - XDG_RUNTIME_DIR=<set> - LANG=en_US.UTF-8 - SHELL=/bin/bash + TERM=screen-256color + PATH=(custom, no user) + XDG_RUNTIME_DIR=<set> + LANG=en_US.UTF-8 + SHELL=/bin/bash SourcePackage: zfs-linux UpgradeStatus: Upgraded to jammy on 2022-08-20 (0 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs']
** Also affects: zfs-linux (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: zfs-linux (Ubuntu Jammy) Assignee: (unassigned) => Heitor Alves de Siqueira (halves) ** Changed in: zfs-linux (Ubuntu Jammy) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu Jammy) Status: New => Incomplete ** Changed in: zfs-linux (Ubuntu Jammy) Status: Incomplete => In Progress ** Changed in: zfs-linux (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1987190 Title: ZFS unrecoverable error after upgrading from 20.04 to 22.04.1 Status in zfs-linux package in Ubuntu: Fix Released Status in zfs-linux source package in Jammy: In Progress Bug description: [ Impact ] Upgrading from 20.04 to 22.04 causes encrypted pools to become unmountable. This is due to broken accounting metadata causing checksum errors on decrypt, which makes ZFS error out early with ECKSUM. [ Test Plan ] This issue needs specific accounting metadata on the zpool to be broken, and as such is somewhat tricky to reproduce organically. A regular test plan for an affected pool should be: 1. Setup encrypted zpool under 20.04 2. Upgrade system to 22.04 (e.g. using do-release-upgrade script) 3. Verify that zpool fails to mount under 22.04 (zpool status will likely point to ZFS-8000-8A "Corrupted data" [0]) [0] https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/ Thankfully, upstream has included a test scenario for this under the ZFS test suite, which is ran during build. The tests/zfs-tests/tests/functional/userquota/13709_reproducer.bz2 file is taken directly from upstream, and corresponds to an encrypted zpool with the required (broken) metadata to reproduce this issue. If the ZFS test suite passes, this should give us a strong signal that this isssue is fixed. [ Where problems could occur ] Although I've backported the upstream test, it'd be great to have confirmation from affected users that this patch resolves the issue. Additionally, we should also perform upgrades in non-affected zpools as well as non-encrypted zpools, to ensure no regressions have been introduced. Considering this change affects the encrypt/decrypt code paths, problems could arise in creating new encrypted zpools, as well as when mounting zpools that have been previously encrypted. [ Other Info ] This SRU includes a little more changes than the minimal changes mentioned in the SRU policy, as I've also backported one of upstream's tests for encrypted pools. This included a new test script (userspace_encrypted_13709.ksh), as well as a binary zpool dump (13709_reproducer.bz2) that I've added under d/s/include-binaries. Considering this issue causes zpools to become unmountable, I think it's worth to include these in the standard ZFS test suite (similar to an autopkgtest scenario for a high-risk regression). These are included in future releases of zfs-linux, and as such only Jammy is affected by this regression. -- [ Original Description ] I have a server that has been running its data volume using ZFS in 20.04 without any problem. The volume is using ZFS encryption and a raidz1-0 configuration. I performed a scrub operations before the upgrade and it did not find any problem. After the reboot for the upgrade, I was welcomed with the following message: status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A The volumes still do not have any checksum error but there are 5 zvols that are not accessible. zpool status displays a line similar to the below for each of the five: errors: Permanent errors have been detected in the following files: tank/data/data:<0x0> I run a scrub and it has not identified any problem but the error messages are not there and the data is still not available. There are 10+ other zvols in the zpool that do not have any kind of problem. I have been unable to identify any correlation between the zvols that are failing. I have seen people reporting similar problems in github after the 20.04 to the 22.04 upgrade (see https://github.com/openzfs/zfs/issues/13763). I wonder how widespread the problem will be as more people upgrades to 22.04. I will try to downgrade the version of zfs in the system and report back ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: zfsutils-linux 2.1.4-0ubuntu0.1 ProcVersionSignature: Ubuntu 5.15.0-46.49-generic 5.15.39 Uname: Linux 5.15.0-46-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu82.1 Architecture: amd64 CasperMD5CheckResult: unknown Date: Sat Aug 20 22:24:54 2022 ProcEnviron: TERM=screen-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: zfs-linux UpgradeStatus: Upgraded to jammy on 2022-08-20 (0 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs'] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp