Igor; Does this only impact CephFS then?
Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -----Original Message----- From: Igor Fedotov [mailto:ifedo...@suse.de] Sent: Monday, April 12, 2021 9:16 AM To: Dominic Hilsbos; ceph-users@ceph.io Subject: Re: [ceph-users] Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag The workaround would be to disable bluestore_fsck_quick_fix_on_mount, do an upgrade and then do a regular fsck. Depending on fsck results either proceed with a repair or not. Thanks, Igor On 4/12/2021 6:35 PM, dhils...@performair.com wrote: > Is there a way to check for these zombie blobs, and other issues needing > repair, prior to the upgrade? That would allow us to know that issues might > be coming, and perhaps address them before they result in corrupt OSDs. > > I'm considering upgrading our clusters from 14 to 15, and would really like > to avoid these kinds of issues. > > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International Inc. > dhils...@performair.com > www.PerformAir.com > > -----Original Message----- > From: Igor Fedotov [mailto:ifedo...@suse.de] > Sent: Monday, April 12, 2021 7:55 AM > To: ceph-users@ceph.io > Subject: [ceph-users] Re: OSDs RocksDB corrupted when upgrading > nautilus->octopus: unknown WriteBatch tag > > Sorry for being too late to the party... > > I think the root cause is related to the high amount of repairs made > during the first post-upgrade fsck run. > > The check (and fix) for zombie spanning blobs was been backported to > v15.2.9 (here is the PR https://github.com/ceph/ceph/pull/39256). And I > presumt it's the one which causes BlueFS data corruption due to huge > transaction happening during such a repair. > > I haven't seen this exact issue (as having that many zombie blobs is a > rarely met bug by itself) but we had to some degree similar issue with > upgrading omap names, see: https://github.com/ceph/ceph/pull/39377 > > Huge resulting transaction could cause too big write to WAL which in > turn caused data corruption (see https://github.com/ceph/ceph/pull/39701) > > Although the fix for the latter has been merged for 15.2.10 some > additional issues with huge transactions might still exist... > > > If someone can afford another OSD loss it could be interesting to get an > OSD log for such a repair with debug-bluefs set to 20... > > I'm planning to make a fix to cap transaction size for repair in the > nearest future anyway though.. > > > Thanks, > > Igor > > > On 4/12/2021 5:15 PM, Dan van der Ster wrote: >> Too bad. Let me continue trying to invoke Cunningham's Law for you ... ;) >> >> Have you excluded any possible hardware issues? >> >> 15.2.10 has a new option to check for all zero reads; maybe try it with true? >> >> Option("bluefs_check_for_zeros", Option::TYPE_BOOL, Option::LEVEL_DEV) >> .set_default(false) >> .set_flag(Option::FLAG_RUNTIME) >> .set_description("Check data read for suspicious pages") >> .set_long_description("Looks into data read to check if there is a >> 4K block entirely filled with zeros. " >> "If this happens, we re-read data. If there is >> difference, we print error to log.") >> .add_see_also("bluestore_retry_disk_reads"), >> >> The "fix zombie spanning blobs" feature was added in 15.2.9. Does >> 15.2.8 work for you? >> >> Cheers, Dan >> >> On Sun, Apr 11, 2021 at 10:17 PM Jonas Jelten <jel...@in.tum.de> wrote: >>> Thanks for the idea, I've tried it with 1 thread, and it shredded another >>> OSD. >>> I've updated the tracker ticket :) >>> >>> At least non-racecondition bugs are hopefully easier to spot... >>> >>> I wouldn't just disable the fsck and upgrade anyway until the cause is >>> rooted out. >>> >>> -- Jonas >>> >>> >>> On 29/03/2021 14.34, Dan van der Ster wrote: >>>> Hi, >>>> >>>> Saw that, looks scary! >>>> >>>> I have no experience with that particular crash, but I was thinking >>>> that if you have already backfilled the degraded PGs, and can afford >>>> to try another OSD, you could try: >>>> >>>> "bluestore_fsck_quick_fix_threads": "1", # because >>>> https://github.com/facebook/rocksdb/issues/5068 showed a similar crash >>>> and the dev said it occurs because WriteBatch is not thread safe. >>>> >>>> "bluestore_fsck_quick_fix_on_mount": "false", # should disable the >>>> fsck during upgrade. See https://github.com/ceph/ceph/pull/40198 >>>> >>>> -- Dan >>>> >>>> On Mon, Mar 29, 2021 at 2:23 PM Jonas Jelten <jel...@in.tum.de> wrote: >>>>> Hi! >>>>> >>>>> After upgrading MONs and MGRs successfully, the first OSD host I upgraded >>>>> on Ubuntu Bionic from 14.2.16 to 15.2.10 >>>>> shredded all OSDs on it by corrupting RocksDB, and they now refuse to >>>>> boot. >>>>> RocksDB complains "Corruption: unknown WriteBatch tag". >>>>> >>>>> The initial crash/corruption occured when the automatic fsck was ran, and >>>>> when it committed the changes for a lot of "zombie spanning blobs". >>>>> >>>>> Tracker issue with logs: https://tracker.ceph.com/issues/50017 >>>>> >>>>> >>>>> Anyone else encountered this error? I've "suspended" the upgrade for now >>>>> :) >>>>> >>>>> -- Jonas >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@ceph.io >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io