Hi David, Thanks for the reply, it's appreciated. We're going to upgrade the cluster to Kraken and see if that fixes the metadata issue.
J On 2 May 2017 at 17:00, David Zafman <dzaf...@redhat.com> wrote: > > James, > > You have an omap corruption. It is likely caused by a bug which has > already been identified. A fix for that problem is available but it is > still pending backport for the next Jewel point release. All 4 of your > replicas have different "omap_digest" values. > > Instead of the xattrs the ceph-osdomap-tool --command > dump-objects-with-keys output from OSDs 3, 10, 11, 23 would be interesting > to compare. > > ***WARNING*** Please backup your data before doing any repair attempts. > > If you can upgrade to Kraken v11.2.0, it will auto repair the omaps on > ceph-osd start up. It will likely still require a ceph pg repair to make > the 4 replicas consistent with each other. The final result may be the > reappearance of removed MDS files in the directory. > > If you can recover the data, you could remove the directory entirely and > rebuild it. The original bug was triggered during omap deletion typically > in a large directory which corresponds to an individual unlink in cephfs. > > If you can build a branch in github to get the newer ceph-osdomap-tool you > could try to use it to repair the omaps. > > David > > > On 5/2/17 5:05 AM, James Eckersall wrote: > > Hi, > > I'm having some issues with a ceph cluster. It's an 8 node cluster rnning > Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7. > This cluster provides RBDs and a CephFS filesystem to a number of clients. > > ceph health detail is showing the following errors: > > pg 2.9 is active+clean+inconsistent, acting [3,10,11,23] > 1 scrub errors > mds0: Metadata damage detected > > > The pg 2.9 is in the cephfs_metadata pool (id 2). > > I've looked at the OSD logs for OSD 3, which is the primary for this PG, > but the only thing that appears relating to this PG is the following: > > log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors > > After initiating a ceph pg repair 2.9, I see the following in the primary > OSD log: > > log_channel(cluster) log [ERR] : 2.9 repair 1 errors, 0 fixed > log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors > > > I found the below command in a previous ceph-users post. Running this > returns the following: > > # rados list-inconsistent-obj 2.9 > {"epoch":23738,"inconsistents":[{"object":{"name":"10000411194.00000000","nspace":"","locator":"","snap":"head","version":14737091},"errors":["omap_digest_mismatch"],"union_shard_errors":[],"selected_object_info":"2:9758b358:::10000411194.00000000:head(33456'14737091 > mds.0.214448:248532 dirty|omap|data_digest s 0 uv 14737091 dd > ffffffff)","shards":[{"osd":3,"errors":[],"size":0,"omap_digest":"0x6748eef3","data_digest":"0xffffffff"},{"osd":10,"errors":[],"size":0,"omap_digest":"0xa791d5a4","data_digest":"0xffffffff"},{"osd":11,"errors":[],"size":0,"omap_digest":"0x53f46ab0","data_digest":"0xffffffff"},{"osd":23,"errors":[],"size":0,"omap_digest":"0x97b80594","data_digest":"0xffffffff"}]}]} > > > So from this, I think that the object in PG 2.9 with the problem is > 10000411194.00000000. > > This is what I see on the filesystem on the 4 OSD's this PG resides on: > > -rw-r--r--. 1 ceph ceph 0 Apr 27 12:31 > /var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > -rw-r--r--. 1 ceph ceph 0 Apr 15 22:05 > /var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > -rw-r--r--. 1 ceph ceph 0 Apr 15 22:07 > /var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > -rw-r--r--. 1 ceph ceph 0 Apr 16 03:58 > /var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > > The extended attrs are as follows, although I have no idea what any of them > mean. > > # file: > var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w== > user.ceph._@1=0s//////8= > user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA > user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ== > user.ceph._parent@1 > =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA== > user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA== > user.cephos.seq=0sAQEQAAAAgcAqFAAAAAAAAAAAAgAAAAA= > user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute > path names > > # file: > var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w== > user.ceph._@1=0s//////8= > user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA > user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ== > user.ceph._parent@1 > =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA== > user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA== > user.cephos.seq=0sAQEQAAAAZaQ9GwAAAAAAAAAAAgAAAAA= > user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute > path names > > # file: > var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w== > user.ceph._@1=0s//////8= > user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA > user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ== > user.ceph._parent@1 > =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA== > user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA== > user.cephos.seq=0sAQEQAAAA1T1dEQAAAAAAAAAAAgAAAAA= > user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute > path names > > # file: > var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2 > user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w== > user.ceph._@1=0s//////8= > user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA > user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ== > user.ceph._parent@1 > =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA== > user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA== > user.cephos.seq=0sAQEQAAAADiM7AAAAAAAAAAAAAgAAAAA= > user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute > path names > > > With metadata damage issue, I can get the list of inodes with the command > below. > > $ ceph tell mds.0 damage ls | python -m "json.tool" > [ > { > "damage_type": "dir_frag", > "frag": "*", > "id": 5129156, > "ino": 1099556021325 > }, > { > "damage_type": "dir_frag", > "frag": "*", > "id": 8983971, > "ino": 1099548098243 > }, > { > "damage_type": "dir_frag", > "frag": "*", > "id": 33278608, > "ino": 1099548257921 > }, > { > "damage_type": "dir_frag", > "frag": "*", > "id": 33455691, > "ino": 1099548271575 > }, > { > "damage_type": "dir_frag", > "frag": "*", > "id": 38203788, > "ino": 1099548134708 > }, > ... > > All of the inodes (approx 800 of them) are for various directories within a > wordpress cache directory. > I ran an rm -rf on each of the directories as I do not need the content. > The content of the directories was removed, but the directories are unable > to be removed as rmdir reports they are not empty, despite having 0 files > listed with ls. > > I'm not sure if these two issues are related to each other. They were > noticed within a day of each other. I think the metadata damage error > appeared before the scrub error. > > I'm at a bit of a loss with how to proceed and I don't want to make things > worse. > > I'd really appreciate any help that anyone can give to try and resolve > these problems. > > Thanks > > J > > > > > _______________________________________________ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com