Hi David,

Thanks for the reply, it's appreciated.
We're going to upgrade the cluster to Kraken and see if that fixes the
metadata issue.

J

On 2 May 2017 at 17:00, David Zafman <dzaf...@redhat.com> wrote:

>
> James,
>
>     You have an omap corruption.  It is likely caused by a bug which has
> already been identified.  A fix for that problem is available but it is
> still pending backport for the next Jewel point release.  All 4 of your
> replicas have different "omap_digest" values.
>
> Instead of the xattrs the ceph-osdomap-tool --command
> dump-objects-with-keys output from OSDs 3, 10, 11, 23 would be interesting
> to compare.
>
> ***WARNING*** Please backup your data before doing any repair attempts.
>
> If you can upgrade to Kraken v11.2.0, it will auto repair the omaps on
> ceph-osd start up.  It will likely still require a ceph pg repair to make
> the 4 replicas consistent with each other.  The final result may be the
> reappearance of removed MDS files in the directory.
>
> If you can recover the data, you could remove the directory entirely and
> rebuild it.  The original bug was triggered during omap deletion typically
> in a large directory which corresponds to an individual unlink in cephfs.
>
> If you can build a branch in github to get the newer ceph-osdomap-tool you
> could try to use it to repair the omaps.
>
> David
>
>
> On 5/2/17 5:05 AM, James Eckersall wrote:
>
> Hi,
>
> I'm having some issues with a ceph cluster.  It's an 8 node cluster rnning
> Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7.
> This cluster provides RBDs and a CephFS filesystem to a number of clients.
>
> ceph health detail is showing the following errors:
>
> pg 2.9 is active+clean+inconsistent, acting [3,10,11,23]
> 1 scrub errors
> mds0: Metadata damage detected
>
>
> The pg 2.9 is in the cephfs_metadata pool (id 2).
>
> I've looked at the OSD logs for OSD 3, which is the primary for this PG,
> but the only thing that appears relating to this PG is the following:
>
> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>
> After initiating a ceph pg repair 2.9, I see the following in the primary
> OSD log:
>
> log_channel(cluster) log [ERR] : 2.9 repair 1 errors, 0 fixed
> log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors
>
>
> I found the below command in a previous ceph-users post.  Running this
> returns the following:
>
> # rados list-inconsistent-obj 2.9
> {"epoch":23738,"inconsistents":[{"object":{"name":"10000411194.00000000","nspace":"","locator":"","snap":"head","version":14737091},"errors":["omap_digest_mismatch"],"union_shard_errors":[],"selected_object_info":"2:9758b358:::10000411194.00000000:head(33456'14737091
> mds.0.214448:248532 dirty|omap|data_digest s 0 uv 14737091 dd
> ffffffff)","shards":[{"osd":3,"errors":[],"size":0,"omap_digest":"0x6748eef3","data_digest":"0xffffffff"},{"osd":10,"errors":[],"size":0,"omap_digest":"0xa791d5a4","data_digest":"0xffffffff"},{"osd":11,"errors":[],"size":0,"omap_digest":"0x53f46ab0","data_digest":"0xffffffff"},{"osd":23,"errors":[],"size":0,"omap_digest":"0x97b80594","data_digest":"0xffffffff"}]}]}
>
>
> So from this, I think that the object in PG 2.9 with the problem is
> 10000411194.00000000.
>
> This is what I see on the filesystem on the 4 OSD's this PG resides on:
>
> -rw-r--r--. 1 ceph ceph 0 Apr 27 12:31
> /var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:05
> /var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 15 22:07
> /var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> -rw-r--r--. 1 ceph ceph 0 Apr 16 03:58
> /var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
>
> The extended attrs are as follows, although I have no idea what any of them
> mean.
>
> # file:
> var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
> user.ceph._@1=0s//////8=
> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
> user.ceph._parent@1
> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
> user.cephos.seq=0sAQEQAAAAgcAqFAAAAAAAAAAAAgAAAAA=
> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
> path names
>
> # file:
> var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
> user.ceph._@1=0s//////8=
> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
> user.ceph._parent@1
> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
> user.cephos.seq=0sAQEQAAAAZaQ9GwAAAAAAAAAAAgAAAAA=
> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
> path names
>
> # file:
> var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
> user.ceph._@1=0s//////8=
> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
> user.ceph._parent@1
> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
> user.cephos.seq=0sAQEQAAAA1T1dEQAAAAAAAAAAAgAAAAA=
> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
> path names
>
> # file:
> var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
> user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
> user.ceph._@1=0s//////8=
> user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
> user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
> user.ceph._parent@1
> =0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
> user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
> user.cephos.seq=0sAQEQAAAADiM7AAAAAAAAAAAAAgAAAAA=
> user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute
> path names
>
>
> With metadata damage issue, I can get the list of inodes with the command
> below.
>
> $ ceph tell mds.0 damage ls | python -m "json.tool"
> [
>     {
>         "damage_type": "dir_frag",
>         "frag": "*",
>         "id": 5129156,
>         "ino": 1099556021325
>     },
>     {
>         "damage_type": "dir_frag",
>         "frag": "*",
>         "id": 8983971,
>         "ino": 1099548098243
>     },
>     {
>         "damage_type": "dir_frag",
>         "frag": "*",
>         "id": 33278608,
>         "ino": 1099548257921
>     },
>     {
>         "damage_type": "dir_frag",
>         "frag": "*",
>         "id": 33455691,
>         "ino": 1099548271575
>     },
>     {
>         "damage_type": "dir_frag",
>         "frag": "*",
>         "id": 38203788,
>         "ino": 1099548134708
>     },
> ...
>
> All of the inodes (approx 800 of them) are for various directories within a
> wordpress cache directory.
> I ran an rm -rf on each of the directories as I do not need the content.
> The content of the directories was removed, but the directories are unable
> to be removed as rmdir reports they are not empty, despite having 0 files
> listed with ls.
>
> I'm not sure if these two issues are related to each other.  They were
> noticed within a day of each other.  I think the metadata damage error
> appeared before the scrub error.
>
> I'm at a bit of a loss with how to proceed and I don't want to make things
> worse.
>
> I'd really appreciate any help that anyone can give to try and resolve
> these problems.
>
> Thanks
>
> J
>
>
>
>
> _______________________________________________
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to