I'm getting a pretty bad history with BTRFS as the default filesystem for Fedora Workstation.  Its messing up repeatedly and leaving me stuck.  I should note that I have used ext2/3/4 for about 20 years, ZFS on Solaris for even longer, and ZFS on Ubuntu for 2 major releases now.  I have 2 different machines that have had issues so far.

On my Gateway Fedora 33 daily-driver machine running the 5.11 kernels, I had a single Patriot SSD using the default BTRFS partitioning scheme. I kept seeing BTRFS scrub reporting uncorrectable issues and assumed that it was a defective SSD. However, this SSD is now in a Mandriva machine and is solid.  Its not the SSD.  I did about 10 reinstalls after having the machine lock up at random times, and finally trashed the machine in frustration.  I later discovered that it had developed a bad memory stick, which may have contributed to the initial problem cause.  However, the lack of BTRFS robustness, no obvious mechanism to keep /home during a reinstall, and very poor BTRFS documentation have left me wary.

On my current daily-driver machine, I have fully-updated Fedora 34 running the 5.12 kernels on 2 disks set up as as a BTRFS RAID-1 pair.  I expected that would allow for much more robustness than the single disk setup on my F33 machine, giving me error protection similar to what I would have on ZFS.  Unfortunately, that does not appear to be the case.  I have run low-level diagnostics on everything in this machine, and it is working properly.  Unusually, there aren't even any failed lowlevel disk blocks on either drive.  So the hardware on this older enterprse-class Lenovo desktop is not faulty.  I believe that due to faulty BIOS and security chip handling in the 5.12 kernel, I have had issues requiring me to occasionally hard powercycle the machine to get it to actually power down.

One would expect that with BTRFS doing RAID-1, recovery from lockups should never leave the filesystem damaged.  That does not appear to be the case.  Currently the disks have no low-level errors, but BTRFS scrub shows 10 unrecoverable errors.  That's messed up.  Both disks are enterprise-class Seagate Constellation 500GB SATA drives with slightly different model numbers and manufacturing dates, so I don't believe that there is any firmware issue with them.  No matter what, I expect that the initial fsck or brtfs check should keep data integrity, but possibly backing out a few seconds in journal transactions.

I am aware of at least one kernel bug being highly relevant as the initial trigger - bugzilla 195809.  I believe that there are serious bugs in the hardware optimization in Firefox (one bug filed) and in Gnome and more relevant bugs in the kernel, but whatever the triggering issue, the filesystem should never fail.

How do I recover?  The machine is currently bootable and seems to run ok, but locks up once in a while on powerdown and on exiting firefox.  I cannot describe it as stable with this BTRFS issue.  A scrub currently says that / (and therefore also /home) has 10 unrecoverable errors.  I can find no Fedora or Suse documentation on how to recover from what should be impossible situations like this.  A reinstall will not preserve /home, leading to unacceptable data loss.  I did an offline btrfs check on my F33 machine that left the machine unbootable, so its probably not an option either.  I'm stuck at this point.  Should I just stop using the default BTRFS filesystem and go back to ext4?

Help appreciated!

--

John Mellor

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to