On 7/1/20 2:50 PM, Josef Bacik wrote: > On 7/1/20 2:24 PM, Matthew Miller wrote: >> On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote: >>> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for >>> F34 >>> could be good option. I know technically it is already opt-in, but it's not >>> very visible or popular. We could make the btrfs option more prominent and >>> ask people to pick it if they are ready to handle potential fallout. >> >> I'm leaning towards recommending this as well. I feel like we don't have >> good data to make a decision on -- the work that Red Hat did previously when >> making a decision was 1) years ago and 2) server-focused, and the Facebook >> production usage is encouraging but also not the same use case. I'm >> particularly concerned about metadata corruption fragility as noted in the >> Usenix paper. (It'd be nice if we could do something about that!) >> > > There's only so much we can do about this. I've sent up patches to ignore > failed global trees to allow users to more easily recover data in case of > corruption in the case of global trees, but as they say if only 1 bit is off > in a node, we throw the whole node away. And throwing a node away means you > lose access to any of its children, which could be a large chunk of the file > system. > > This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but > this is just the reality of using checksums. It's a checksum, not ECC. We > don't know _which_ bits are fucked, we just know somethings fucked, so we > throw it all away. If you have RAID or DUP then we go read the other copy, > and fix the broken copy if we find a good copy. If we don't, well then > there's nothing really we can do.
There is often a path forward when a bad metadata checksum is detected. i.e. e2fsck: scan_extent_node() { ... /* Failed csum but passes checks? Ask to fix checksum. */ if (failed_csum && fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID, pctx)) { pb->inode_modified = 1; pctx->errcode = ext2fs_extent_replace(ehandle, 0, &extent); if (pctx->errcode) return; } it does similarly for many types of metadata: /* inode passes checks, but checksum does not match inode */ #define PR_1_INODE_ONLY_CSUM_INVALID 0x010068 -- /* Inode extent block passes checks, but checksum does not match extent */ #define PR_1_EXTENT_ONLY_CSUM_INVALID 0x01006A -- /* Inode extended attribute block passes checks, but checksum does not * match block. */ #define PR_1_EA_BLOCK_ONLY_CSUM_INVALID 0x01006C -- /* dir leaf node passes checks, but fails checksum */ #define PR_2_LEAF_NODE_ONLY_CSUM_INVALID 0x02004D Does btrfsck really never attempt to salvage a metadata block with a bad CRC by validating its fields? -Eric _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org