We just had metadata damage show up on our Jewel cluster. I tried a few things like renaming directories and scanning, but the damage would just show up again in less than 24 hours. I finally just copied the directories with the damage to a tmp location on CephFS, then swapped it with the damaged one. When I deleted the directories with the damage the active MDS crashed, but the replay took over just fine. I haven't had the messages now for almost a week. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Mon, Aug 19, 2019 at 10:30 PM Lars Täuber <taeu...@bbaw.de> wrote: > Hi there! > > Does anyone else have an idea what I could do to get rid of this error? > > BTW: it is the third time that the pg 20.0 is gone inconsistent. > This is a pg from the metadata pool (cephfs). > May this be related anyhow? > > # ceph health detail > HEALTH_ERR 1 MDSs report damaged metadata; 1 scrub errors; Possible data > damage: 1 pg inconsistent > MDS_DAMAGE 1 MDSs report damaged metadata > mdsmds3(mds.0): Metadata damage detected > OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent > pg 20.0 is active+clean+inconsistent, acting [9,27,15] > > > Best regards, > Lars > > > Mon, 19 Aug 2019 13:51:59 +0200 > Lars Täuber <taeu...@bbaw.de> ==> Paul Emmerich <paul.emmer...@croit.io> : > > Hi Paul, > > > > thanks for the hint. > > > > I did a recursive scrub from "/". The log says there where some inodes > with bad backtraces repaired. But the error remains. > > May this have something to do with a deleted file? Or a file within a > snapshot? > > > > The path told by > > > > # ceph tell mds.mds3 damage ls > > 2019-08-19 13:43:04.608 7f563f7f6700 0 client.894552 ms_handle_reset on > v2:192.168.16.23:6800/176704036 > > 2019-08-19 13:43:04.624 7f56407f8700 0 client.894558 ms_handle_reset on > v2:192.168.16.23:6800/176704036 > > [ > > { > > "damage_type": "backtrace", > > "id": 3760765989, > > "ino": 1099518115802, > > "path": "~mds0/stray7/100005161f7/dovecot.index.backup" > > } > > ] > > > > starts a bit strange to me. > > > > Are the snapshots also repaired with a recursive repair operation? > > > > Thanks > > Lars > > > > > > Mon, 19 Aug 2019 13:30:53 +0200 > > Paul Emmerich <paul.emmer...@croit.io> ==> Lars Täuber <taeu...@bbaw.de> > : > > > Hi, > > > > > > that error just says that the path is wrong. I unfortunately don't > > > know the correct way to instruct it to scrub a stray path off the top > > > of my head; you can always run a recursive scrub on / to go over > > > everything, though > > > > > > > > > Paul > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com