Igor;

Does this only impact CephFS then?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-----Original Message-----
From: Igor Fedotov [mailto:ifedo...@suse.de] 
Sent: Monday, April 12, 2021 9:16 AM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSDs RocksDB corrupted when upgrading 
nautilus->octopus: unknown WriteBatch tag

The workaround would be to disable bluestore_fsck_quick_fix_on_mount, do 
an upgrade and then do a regular fsck.

Depending on fsck  results either proceed with a repair or not.


Thanks,

Igor


On 4/12/2021 6:35 PM, dhils...@performair.com wrote:
> Is there a way to check for these zombie blobs, and other issues needing 
> repair, prior to the upgrade?  That would allow us to know that issues might 
> be coming, and perhaps address them before they result in corrupt OSDs.
>
> I'm considering upgrading our clusters from 14 to 15, and would really like 
> to avoid these kinds of issues.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
> -----Original Message-----
> From: Igor Fedotov [mailto:ifedo...@suse.de]
> Sent: Monday, April 12, 2021 7:55 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: OSDs RocksDB corrupted when upgrading 
> nautilus->octopus: unknown WriteBatch tag
>
> Sorry for being too late to the party...
>
> I think the root cause is related to the high amount of repairs made
> during the first post-upgrade fsck run.
>
> The check (and fix) for zombie spanning blobs was been backported to
> v15.2.9 (here is the PR https://github.com/ceph/ceph/pull/39256). And I
> presumt it's the one which causes BlueFS data corruption due to huge
> transaction happening during such a repair.
>
> I haven't seen this exact issue (as having that many zombie blobs is a
> rarely met bug by itself) but we had to some degree similar issue with
> upgrading omap names, see: https://github.com/ceph/ceph/pull/39377
>
> Huge resulting transaction could cause too big write to WAL which in
> turn caused data corruption (see https://github.com/ceph/ceph/pull/39701)
>
> Although the fix for the latter has been merged for 15.2.10 some
> additional issues with huge transactions might still exist...
>
>
> If someone can afford another OSD loss it could be interesting to get an
> OSD log for such a repair with debug-bluefs set to 20...
>
> I'm planning to make a fix to cap transaction size for repair in the
> nearest future anyway though..
>
>
> Thanks,
>
> Igor
>
>
> On 4/12/2021 5:15 PM, Dan van der Ster wrote:
>> Too bad. Let me continue trying to invoke Cunningham's Law for you ... ;)
>>
>> Have you excluded any possible hardware issues?
>>
>> 15.2.10 has a new option to check for all zero reads; maybe try it with true?
>>
>>       Option("bluefs_check_for_zeros", Option::TYPE_BOOL, Option::LEVEL_DEV)
>>       .set_default(false)
>>       .set_flag(Option::FLAG_RUNTIME)
>>       .set_description("Check data read for suspicious pages")
>>       .set_long_description("Looks into data read to check if there is a
>> 4K block entirely filled with zeros. "
>>                           "If this happens, we re-read data. If there is
>> difference, we print error to log.")
>>       .add_see_also("bluestore_retry_disk_reads"),
>>
>> The "fix zombie spanning blobs" feature was added in 15.2.9. Does
>> 15.2.8 work for you?
>>
>> Cheers, Dan
>>
>> On Sun, Apr 11, 2021 at 10:17 PM Jonas Jelten <jel...@in.tum.de> wrote:
>>> Thanks for the idea, I've tried it with 1 thread, and it shredded another 
>>> OSD.
>>> I've updated the tracker ticket :)
>>>
>>> At least non-racecondition bugs are hopefully easier to spot...
>>>
>>> I wouldn't just disable the fsck and upgrade anyway until the cause is 
>>> rooted out.
>>>
>>> -- Jonas
>>>
>>>
>>> On 29/03/2021 14.34, Dan van der Ster wrote:
>>>> Hi,
>>>>
>>>> Saw that, looks scary!
>>>>
>>>> I have no experience with that particular crash, but I was thinking
>>>> that if you have already backfilled the degraded PGs, and can afford
>>>> to try another OSD, you could try:
>>>>
>>>>       "bluestore_fsck_quick_fix_threads": "1",  # because
>>>> https://github.com/facebook/rocksdb/issues/5068 showed a similar crash
>>>> and the dev said it occurs because WriteBatch is not thread safe.
>>>>
>>>>       "bluestore_fsck_quick_fix_on_mount": "false", # should disable the
>>>> fsck during upgrade. See https://github.com/ceph/ceph/pull/40198
>>>>
>>>> -- Dan
>>>>
>>>> On Mon, Mar 29, 2021 at 2:23 PM Jonas Jelten <jel...@in.tum.de> wrote:
>>>>> Hi!
>>>>>
>>>>> After upgrading MONs and MGRs successfully, the first OSD host I upgraded 
>>>>> on Ubuntu Bionic from 14.2.16 to 15.2.10
>>>>> shredded all OSDs on it by corrupting RocksDB, and they now refuse to 
>>>>> boot.
>>>>> RocksDB complains "Corruption: unknown WriteBatch tag".
>>>>>
>>>>> The initial crash/corruption occured when the automatic fsck was ran, and 
>>>>> when it committed the changes for a lot of "zombie spanning blobs".
>>>>>
>>>>> Tracker issue with logs: https://tracker.ceph.com/issues/50017
>>>>>
>>>>>
>>>>> Anyone else encountered this error? I've "suspended" the upgrade for now 
>>>>> :)
>>>>>
>>>>> -- Jonas
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to