Hi again, after two weeks I've got another inconsistent PG in same cluster. OSD's are different from first PG, object can not be GET as well:
# rados list-inconsistent-obj 26.821 --format=json-pretty { "epoch": 178472, "inconsistents": [ { "object": { "name": "default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7", "nspace": "", "locator": "", "snap": "head", "version": 118920 }, "errors": [], "union_shard_errors": [ "data_digest_mismatch_oi" ], "selected_object_info": "26:8411bae4:::default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7:head(126495'118920 client.142609570.0:41412640 dirty|data_digest|omap_digest s 4194304 uv 118920 dd cd142aaa od ffffffff alloc_hint [0 0])", "shards": [ { "osd": 20, "errors": [ "data_digest_mismatch_oi" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x6b102e59" }, { "osd": 44, "errors": [ "data_digest_mismatch_oi" ], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x6b102e59" } ] } ] } # rados -p .rgw.buckets get default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7 test_2pg.file error getting .rgw.buckets/default.122888368.52__shadow_.3ubGZwLcz0oQ55-LTb7PCOTwKkv-nQf_7: (5) Input/output error Still struggling how to solve it. Any ideas, guys? Thank you On Tue, Jul 24, 2018 at 10:27 AM, Arvydas Opulskis <zebedie...@gmail.com> wrote: > Hello, Cephers, > > after trying different repair approaches I am out of ideas how to repair > inconsistent PG. I hope, someones sharp eye will notice what I overlooked. > > Some info about cluster: > Centos 7.4 > Jewel 10.2.10 > Pool size 2 (yes, I know it's a very bad choice) > Pool with inconsistent PG: .rgw.buckets > > After routine deep-scrub I've found PG 26.c3f in inconsistent status. > While running "ceph pg repair 26.c3f" command and monitoring "ceph -w" log, > I noticed these errors: > > 2018-07-24 08:28:06.517042 osd.36 [ERR] 26.c3f shard 30: soid > 26:fc32a1f1:::default.142609570.87_20180206.093111% > 2frepositories%2fnuget-local%2fApplication%2fCompany. > Application.Api%2fCompany.Application.Api.1.1.1.nupkg. > artifactory-metadata%2fproperties.xml:head data_digest 0x540e4f8b != > data_digest 0x49a34c1f from auth oi 26:e261561a:::default. > 168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data- > segmentation.application.131.xxx-jvm.cpu.load%2f2018-05- > 05T03%3a51%3a39+00%3a00.sha1:head(167828'216051 > client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051 > dd 49a34c1f od ffffffff alloc_hint [0 0]) > > 2018-07-24 08:28:06.517118 osd.36 [ERR] 26.c3f shard 36: soid > 26:fc32a1f1:::default.142609570.87_20180206.093111% > 2frepositories%2fnuget-local%2fApplication%2fCompany. > Application.Api%2fCompany.Application.Api.1.1.1.nupkg. > artifactory-metadata%2fproperties.xml:head data_digest 0x540e4f8b != > data_digest 0x49a34c1f from auth oi 26:e261561a:::default. > 168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data- > segmentation.application.131.xxx-jvm.cpu.load%2f2018-05- > 05T03%3a51%3a39+00%3a00.sha1:head(167828'216051 > client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051 > dd 49a34c1f od ffffffff alloc_hint [0 0]) > > 2018-07-24 08:28:06.517122 osd.36 [ERR] 26.c3f soid 26:fc32a1f1:::default. > 142609570.87_20180206.093111%2frepositories%2fnuget-local% > 2fApplication%2fCompany.Application.Api%2fCompany. > Application.Api.1.1.1.nupkg.artifactory-metadata%2fproperties.xml:head: > failed to pick suitable auth object > > ...and same errors about another object on same PG. > > Repair failed, so I checked inconsistencies "rados list-inconsistent-obj > 26.c3f --format=json-pretty": > > { > "epoch": 178403, > "inconsistents": [ > { > "object": { > "name": "default.142609570.87_ > 20180203.020047\/repositories\/docker-local\/yyy\/company. > yyy.api.assets\/1.2.4\/sha256__ce41e5246ead8bddd2a2b5bbb863db > 250f328be9dc5c3041481d778a32f8130d", > "nspace": "", > "locator": "", > "snap": "head", > "version": 217749 > }, > "errors": [], > "union_shard_errors": [ > "data_digest_mismatch_oi" > ], > "selected_object_info": "26:f4ce1748:::default. > 168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data- > segmentation.application.131.xxx-jvm.cpu.load%2f2018-05- > 08T03%3a45%3a15+00%3a00.sha1:head(167944'217749 > client.177936559.0:1884719302 dirty|data_digest|omap_digest s 40 uv 217749 > dd 422f251b od ffffffff alloc_hint [0 0])", > "shards": [ > { > "osd": 30, > "errors": [ > "data_digest_mismatch_oi" > ], > "size": 40, > "omap_digest": "0xffffffff", > "data_digest": "0x551c282f" > }, > { > "osd": 36, > "errors": [ > "data_digest_mismatch_oi" > ], > "size": 40, > "omap_digest": "0xffffffff", > "data_digest": "0x551c282f" > } > ] > }, > { > "object": { > "name": "default.142609570.87_ > 20180206.093111\/repositories\/nuget-local\/Application\/ > Company.Application.Api\/Company.Application.Api.1.1.1. > nupkg.artifactory-metadata\/properties.xml", > "nspace": "", > "locator": "", > "snap": "head", > "version": 216051 > }, > "errors": [], > "union_shard_errors": [ > "data_digest_mismatch_oi" > ], > "selected_object_info": "26:e261561a:::default. > 168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data- > segmentation.application.131.xxx-jvm.cpu.load%2f2018-05- > 05T03%3a51%3a39+00%3a00.sha1:head(167828'216051 > client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051 > dd 49a34c1f od ffffffff alloc_hint [0 0])", > "shards": [ > { > "osd": 30, > "errors": [ > "data_digest_mismatch_oi" > ], > "size": 40, > "omap_digest": "0xffffffff", > "data_digest": "0x540e4f8b" > }, > { > "osd": 36, > "errors": [ > "data_digest_mismatch_oi" > ], > "size": 40, > "omap_digest": "0xffffffff", > "data_digest": "0x540e4f8b" > } > ] > } > ] > } > > > After some reading, I understand, I needed rados get/put trick to solve > this problem. I couldn't do rados get, because I was getting "no such file" > error, even objects were listed by "rados ls" command, so I got them > directly from OSD. After putting them back to rados (rados commands doesn't > returned any errors) and doing deep-scrub on same PG, problem still > existed. The only thing changed - when I try to get object via rados now I > get "(5) Input/output error". > > I tried force object size to 40 (it's real size of both objects) by adding > "-o 40" option to "rados put" command, but with no luck. > > Guys, maybe you have other ideas what to try? Why overwriting object > doesn't solve this problem? > > Thanks a lot! > > Arvydas > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com