Re: /boot on btrfs

Chris Murphy Fri, 18 Nov 2016 14:47:49 -0800

Some of this is a repeat, as this comes up from time to time.

Neither Red Hat nor Fedora currently have Btrfs developers to manage any
Btrfs regressions that come up. And as the upstream development is still
very busy, there are regressions, and there's every reason to believe
Fedora users would get disproportionately impacted because Fedora kernels
tend to be very recent.


SUSE is in a sweet spot because they have a bunch of Btrfs developers, and
they are running longterm kernels. So it is upstream testers, including
Fedora using testers, who hit regressions, get them reported and fixed
before they could ever appear in a longterm kernel on any of the SUSE
offerings. And because of their strong Btrfs developer presence, they're
doing a lot of stable backports.

I think Fedora could soon do Btrfs with single device only. But I think it
requires community work to build Btrfs regression tests (these exists
already) within Fedora infrastructure, so that every built kernel runs
those tests. And then some interpretation is needed to know whether a
particular failure is tolerable, users need notification, or the build
needs to be failed. The details of this need to be reviewed by the kernel
team. Right now they can't do that work themselves, they have plenty on
their plate now.

As for stability, it's kinda complicated. Single device Btrfs is stable,
unless the device lies about committing to disk when it hasn't, and as it
turns out devices do transiently corrupt data. Btrfs is going to give a
heads up when that happens, and if it's bad, as in further writes can
corrupt the filesystem, it'll go read only. Other filesystems tolerate this
condition far longer. What's interesting is both XFS and ext4 now default
at mkfs time to checksumming metadata. So they can catch similar problems,
just not corruption with actual file data which is a much larger target,
and can still cause further system corruption if it goes unchecked.

Anyway, since using it as primary filesystem for roughly  5 years, I've had
no problems on single device Btrfs that weren't user induced. On raid1 and
raid 10 I have found meaningful bugs and deficient features, that a user
can innocently run into, despitere being known and documented. But in those
cases, while i lost redundancy, no data was lost or corrupted.

Meanwhile the Btrfs list still fields weird failure in occasion. Steps for
repairing Btrfs if it face plants is really non obvious. It's a lot like
throwing spaghetti at a wall (even though that's not how you should test
your spaghetti people!) Eventually the idea is, kernel code should not f up
in the first place even if the device lies, but even if it goes badly, can
recover and fix itself without fsck. The fsck is really something of a
debugging and time saving tool. Not the ideal scenario. Consider that Btrfs
is supposed to scale, and none of the current raid levels scale, and fsck
doesn't scale either, nor does scrub, or balance. So Btrfs is going to keep
changing and getting better.


--
Chris Murphy

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org

Re: /boot on btrfs

Reply via email to