On 7/1/20 2:50 PM, Josef Bacik wrote:
> On 7/1/20 2:24 PM, Matthew Miller wrote:
>> On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
>>> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for 
>>> F34
>>> could be good option. I know technically it is already opt-in, but it's not
>>> very visible or popular. We could make the btrfs option more prominent and
>>> ask people to pick it if they are ready to handle potential fallout.
>>
>> I'm leaning towards recommending this as well. I feel like we don't have
>> good data to make a decision on -- the work that Red Hat did previously when
>> making a decision was 1) years ago and 2) server-focused, and the Facebook
>> production usage is encouraging but also not the same use case. I'm
>> particularly concerned about metadata corruption fragility as noted in the
>> Usenix paper. (It'd be nice if we could do something about that!)
>>
> 
> There's only so much we can do about this.  I've sent up patches to ignore 
> failed global trees to allow users to more easily recover data in case of 
> corruption in the case of global trees, but as they say if only 1 bit is off 
> in a node, we throw the whole node away.  And throwing a node away means you 
> lose access to any of its children, which could be a large chunk of the file 
> system.
> 
> This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but 
> this is just the reality of using checksums.  It's a checksum, not ECC.  We 
> don't know _which_ bits are fucked, we just know somethings fucked, so we 
> throw it all away.  If you have RAID or DUP then we go read the other copy, 
> and fix the broken copy if we find a good copy.  If we don't, well then 
> there's nothing really we can do.

There is often a path forward when a bad metadata checksum is detected.
i.e. e2fsck:

scan_extent_node() {
...

        /* Failed csum but passes checks?  Ask to fix checksum. */
        if (failed_csum &&
            fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID, pctx)) {
                pb->inode_modified = 1;
                pctx->errcode = ext2fs_extent_replace(ehandle, 0, &extent);
                if (pctx->errcode)
                        return;
        }

it does similarly for many types of metadata:

/* inode passes checks, but checksum does not match inode */
#define PR_1_INODE_ONLY_CSUM_INVALID            0x010068
--
/* Inode extent block passes checks, but checksum does not match extent */
#define PR_1_EXTENT_ONLY_CSUM_INVALID           0x01006A
--
/* Inode extended attribute block passes checks, but checksum does not
 * match block. */
#define PR_1_EA_BLOCK_ONLY_CSUM_INVALID         0x01006C
--
/* dir leaf node passes checks, but fails checksum */
#define PR_2_LEAF_NODE_ONLY_CSUM_INVALID        0x02004D

Does btrfsck really never attempt to salvage a metadata block with a bad CRC by
validating its fields?

-Eric
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to