= Verification =
I see 2 pieces to this:
1) The original report, in Comment #1, where the offending patch caused an
issue on a system where it shouldn't have - i.e., a raid0 w/ homogenous member
sizes. We were never able to reproduce this in subsequent tests w/ the patch
applied. I know sfeole was able to perform the same MAAS install/upgrade w/ the
current -proposed kernel (I saw a test report from it), so I think we can
confidently say it is not reproducible in that build either.
2) In configs where this patch *should* prevent a raid0 from assembling
(heterogenous sizes), I've verified that if I create such an array on an older
kernel, then upgrade to the current -proposed kernel, it starts automatically.
Now, of course, I continue to be susceptible to corruption, but that's known
and tracked in bug 1850540.
$ cat /proc/version
Linux version 4.15.0-66-generic (buildd@lgw01-amd64-044) (gcc version 7.4.0
(Ubuntu 7.4.0-1ubuntu1~18.04.1)) #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019
$ sudo mdadm --create /dev/md0 --run --metadata=default --homehost=akis
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
mdadm: /dev/vdb1 appears to be part of a raid array:
level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019
mdadm: /dev/vdc1 appears to be part of a raid array:
level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019
mdadm: array /dev/md0 started.
$ sudo reboot
$ cat /proc/version
Linux version 4.15.0-68-generic (buildd@lgw01-amd64-037) (gcc version 7.4.0
(Ubuntu 7.4.0-1ubuntu1~18.04.1)) #77-Ubuntu SMP Sun Oct 27 06:02:23 UTC 2019
$ cat /proc/mdstat
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4]
[raid10]
md127 : active raid0 vdc1[1] vdb1[0]
1567744 blocks super 1.2 512k chunks
unused devices: <none>
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682
Title:
[REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with
default_layout setting
Status in linux package in Ubuntu:
Incomplete
Status in linux source package in Bionic:
Fix Committed
Status in linux source package in Disco:
Incomplete
Status in linux source package in Eoan:
Incomplete
Status in linux source package in Focal:
Incomplete
Bug description:
This bug tracks the temporary revert of the upstream fix for a
corruption issue. Bug 1850540 tracks the re-application of that fix
once we have a full solution.
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >=
3.14.
This is because of an change in v3.14 that accidentally changed how data was
written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
To summarize, upstream is dealing with this by adding a versioned
layout in v5.4, and that is being backported to stable kernels - which
is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
corruption. However, unless a layout-version-aware kernel *created*
the array, there's no way for the kernel to know which version(s) was
used to write the existing data. This undefined mode is considered
"Version 0", and the kernel will now refuse to start these arrays w/o
user intervention.
The user experience is pretty awful here. A user upgrades to the next
SRU and all of a sudden their system stops at an (initramfs) prompt. A
clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1,
is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with
default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data -
specifically the data toward the end of your array - was most likely
written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
with the kernel parameter raid0.default_layout=1 or
raid0.default_layout=2 on the kernel command line. And note it should
be *raid0.default_layout* not *raid.default_layout* as the message
says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly
walks the user through this process, and update the error message to
point to that page. I'd also like to see if we can detect this problem
*before* the user reboots (debconf?) and help the user fix things.
e.g. "We detected that you have RAID0 arrays that maybe susceptible to
a corruption problem", guide the user to choosing a layout, and update
the mdadm initramfs hook to poke the answer in via sysfs before
starting the array on reboot.
Note that it also seems like we should investigate backporting this to
< 3.14 kernels. Imagine a user switching between the trusty HWE kernel
and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user
here had a raid0 of 8 identically-sized devices. I suspect there's a
bug in the detection code somewhere.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp